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Abstract 


Simulation  models,  particularly  those  used  for  evaluation  of  real  world  policies  and 
practices,  are  growing  in  size  and  complexity.  As  the  size  and  complexity  of  the  model 
increases  so  does  the  time  and  resources  needed  to  validate  the  model.  Multi-agent 
network  models  pose  an  even  greater  challenge  for  validation  as  they  can  be  validated  at 
the  individual  actor,  the  network,  and/or  the  population  level.  Validation  is  crucial  for 
acceptance  and  use  of  simulations,  particularly  in  areas  where  the  outcomes  of  the  model 
will  be  used  to  inform  real  world  decisions.  There  are  however,  substantial  obstacles  to 
validation.  The  nature  of  modeling  means  that  there  are  implicit  model  assumptions,  a 
complex  model  space  and  interactions,  emergent  behaviors,  and  uncodified  and 
inoperable  simulation  and  validation  knowledge.  The  nature  of  the  data,  particularly  in 
the  realm  of  complex  socio-technical  systems  poses  still  further  obstacles  to  validation. 
These  include  sparse,  inconsistent,  old,  erroneous,  and  mixed  scale  data.  Given  all  these 
obstacles,  the  process  of  validating  modern  multi-agent  network  simulation  models  of 
complex  socio-technical  systems  is  such  a  herculean  task  that  it  often  takes  large  groups 
of  people  years  to  accomplish.  Automated  and  semi-automated  tools  are  needed  to 
support  validation  activities  and  so  reduce  the  time  and  number  of  personnel  needed. 

This  thesis  proposes  such  a  tool.  It  advances  the  state  of  the  art  of  simulation 
validation  by  using  knowledge  and  ontological  representation  and  inference.  Advances 
are  made  at  both  conceptual  and  implementation  or  tool  level. 

A  conceptualization  is  developed  on  how  to  construct  a  reasoning  system  for 
simulation  validation.  This  conceptualization  sheds  light  on  the  relationships  between 
simulation  code,  process  logic,  causal  logic,  conceptual  model,  ontology,  and  empirical 
data  and  knowledge.  In  particular,  causal  logic  is  employed  to  describe  the  cause-and- 
effect  relationships  in  the  simulation  and  “if-then”  rules  closely  tied  to  the  cause-and- 
effect  relationships  encode  how  causal  parameters  and  links  should  change  given 
empirical  data.  The  actual  change  is  based  on  minimal  model  perturbations.  This 
conceptualization  facilitates  the  encoding  of  simulation  knowledge  and  the  automation  of 
validation.  As  a  side  effect,  it  also  paves  a  way  for  the  automation  of  simulation  model 
improvement. 

Based  on  this  conceptualization,  a  tool  is  developed.  This  tool,  called  WIZER  for 
What-If  Analyzer,  was  implemented  to  automate  simulation  validation.  WIZER  makes 
the  model  assumptions  explicit,  handles  a  complex  model  space  and  interactions, 
captures  emergent  behaviors,  and  facilitates  codification  and  computer-processing  of 
simulation  and  validation  data.  WIZER  consists  of  four  parts:  the  Alert  WIZER,  the 
Inference  Engine,  the  Simulation  Knowledge  Space  module,  and  the  Empirical/Domain 
Knowledge  Space  module. 

The  Alert  WIZER  is  able  to  characterize  simulation  data  with  the  assistance  from 
statistical  tools  it  can  semantically  control,  compare  the  data  to  the  empirical  data,  and 
produce  symbolic  or  semantic  categorization  of  both  the  data  and  the  comparison.  The 
Inference  Engine  is  able  to  perform  both  causal  and  “if-then”  rule  inferences.  The  causal 
inferences  capture  the  core  workings  of  the  simulations,  while  the  “if-then”  rule 
inferences  hint  at  which  model  parameters  or  links  need  change  given  the  symbolic 
categories  from  the  Alert  WIZER.  Both  kinds  of  rule  inferences  have  access  to  ontology. 
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The  Inference  Engine  is  in  the  form  of  a  forward-chaining  production  system  but  with 
knowledge-based  and  ontological  conflict  resolution.  It  performs  minimal  model 
perturbations  based  on  knowledge  bases  and  ontology.  The  perturbations  result  in  new 
parameter  values  and/or  meta-model  values  best  judged  to  move  the  simulator  closer  to 
validity  for  the  next  cycle  of  simulation.  Both  the  simulation  knowledge  space  and  the 
domain  knowledge  space  are  in  the  form  of  a  graph,  with  nodes  representing  entities, 
edges  representing  relationships,  and  node  attributes  representing  properties  of  the 
entities.  Knowledge-based  and  ontological  reasoning  is  performed  on  both  knowledge 
spaces.  A  simple  hypothesis  can  be  formed  by  search  and  inference  in  the  knowledge 
bases  and  ontologies. 

Several  validation  scenarios  on  two  simulation  models  are  used  to  demonstrate 
that  WIZER  is  general  enough  to  be  able  to  assist  in  validating  diverse  models.  The  first 
model  is  Bio  War,  a  city-scale  multi-agent  social-network  of  weaponized  disease  spread 
in  a  demographically  realistic  population  with  naturally-occurring  diseases.  The  empirical 
data  used  for  the  WIZER  validation  of  BioWar  comes  from  the  National  Institute  of 
Allergy  and  Infectious  Disease  and  other  sources.  The  second  model  is  CONSTRUCT,  a 
model  for  co-evolution  of  social  and  knowledge  networks  under  diverse  communication 
scenarios.  The  empirical  data  used  for  the  WIZER  validation  of  CONSTRUCT  comes 
from  Kapferer's  empirical  observation  of  Zambia's  tailor-shop's  workers  and 
management. 

The  results  of  BioWar  validation  exercise  show  that  the  simulated  annual  average 
influenza  incidence  and  the  relative  timing  of  the  peaks  of  incidence,  school  absenteeism, 
and  drug  purchase  curves  can  be  validated  by  WIZER  in  a  clear  and  concise  manner.  The 
CONSTRUCT  validation  exercises  produce  results  showing  that  the  simulated  average 
probability  of  interaction  among  workers  and  the  relative  magnitude  of  the  change  of  the 
simulated  average  probability  of  interaction  between  different  groups  can  be  matched 
against  empirical  data  and  knowledge  by  WIZER.  Moreover,  the  results  of  these  two 
validation  exercises  indicate  the  utility  of  the  semantic  categorization  ability  of  the  Alert 
WIZER  and  the  feasibility  of  WIZER  as  an  automated  validation  tool.  One  specific 
CONSTRUCT  validation  exercise  indicates  that  “what-if’  questions  are  facilitated  by 
WIZER  for  the  purpose  of  model-improvement,  and  that  the  amount  of  necessary  search 
is  significantly  less  and  the  focus  of  that  search  is  significantly  better  using  WIZER  than 
using  Response  Surface  Methodology. 

Tools  such  as  WIZER  can  significantly  reduce  the  time  for  validation  of  large 
scale  simulation  systems.  Such  tools  are  particularly  valuable  in  fields  where  multi-agent 
systems  are  needed  to  model  heterogeneous  populations  and  diverse  knowledge,  such  as 
organizational  theory,  management,  knowledge  management,  biomedical  informatics, 
modeling  and  simulation,  and  policy  analysis  and  design. 
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Chapter  I:  Introduction 


Validation  is  a  critical  problem  for  the  use  of  simulations  in  poliey  design  and  poliey 
making.  Many  erueial  real  world  problems  are  eomplex  and  simulations  provide  a  means 
to  understand  them.  Validation  is  a  very  different  notion  from  verifieation.  In  validation, 
the  foeus  is  in  how  to  build  the  right  produet,  while  in  verifieation  the  foeus  is  in  how  to 
build  the  produet  right.  Exeept  for  simulations  aeeredited  via  a  labor-intensive  proeess  of 
verifieation,  validation,  and  aeoreditation  (VV&A),  most  people  do  not  trust  simulation 
results.  Curiously  enough,  there  is  an  additional  step  -  the  aeereditation  step  -  that  needs 
to  be  performed  after  the  validation  step  in  the  VV&A  proeess.  If  a  simulation  model  is 
eertified  valid,  why  is  aeereditation  needed?  This  means  the  validation  step  is  still 
perceived  to  potentially  produee  invalid  results  or  mismatehes  in  application.  Thus  it  is 
crucial  to  get  the  validation  proeess  right. 

Modeling  and  simulation  is  beeoming  a  useful  seientifie  tool.  Unlike  the  scientifie 
problems  of  previous  eras,  most  problems  of  eonsequenee  today  are  eomplex  and  rieh  in 
data,  rendering  less  likely  that  a  lone  seientist  with  paper  and  peneil  would  be  able  to 
solve  them.  This  is  partieularly  evident  in  biomedical  and  social  sciences.  As  the 
eomplexity  of  modeling  and  simulation  -  and  the  size  of  simulations  -  inereases, 
assessing  whether  the  models  and  simulations  are  valid  is  beeoming  an  indispensable 
element  of  the  development  proeess.  Moreover,  due  to  the  size  of  the  validation  task,  it  is 
necessary  to  have  automated  tools  for  the  validation  of  models  and  simulations.  Model 
assessment  -  determining  how  valid  and  robust  a  model  is  -  is  becoming  a  major 
concern.  For  example,  NATO  argued  that  identifying  reliable  validation  methods  for 
eleetronie  medieal  surveillanee  systems  is  a  eritieal  researeh  area  (Reifman  et  al.  2004). 
From  the  poliey  maker  perspeetive,  the  main  question  is  whether  the  simulation  is  valid 
enough  to  answer  the  poliey  questions.  Indeed,  laek  of  eonfidenee  in  the  validity  of 
simulations  leads  to  a  debate  whether  simulations  mean  anything  substantial  or  even 
anything  at  all  as  a  basis  for  business  and  policy  decisions.  There  are  organizations 
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dedicated  to  doing  W&A,  but  there  is  a  question  of  whether  VV&A  is  objective  and 
doing  VV&A  this  way  consumes  a  lot  of  time  and  resources.  Here  the  automation  of 
validation  comes  into  play.  Automation  requires  all  assumptions  and  inferences  be  made 
explicit  and  operable,  and  lends  to  the  assessment  of  the  robustness  of  simulation 
scenarios. 

One  area  of  science  that  needs  better  modeling  and  simulation  is  Social  Sciences, 
especially  for  societal  modeling.  Societal  modeling  is  complex  due  to  the  many  layers  of 
physical  reality  affecting  society  and  the  interactions  within  and  between  the  layers  - 
with  the  emergence  of  social  patterns  and  norms  from  the  interactions.  The  biological 
layer  of  physical  reality,  for  example,  includes  the  neural  basis  for  social  interaction 
(Frith  and  Wolpert  2004).  At  the  sociological  layer,  computational  modeling  and  analysis 
(Axelrod  1997,  Carley  and  Prietula  1999,  Epstein  and  Axtell  1996,  Prietula  et  al.  1998)  - 
including  the  simulation  component  -  has  emerged  as  a  useful  tool. 

Computational  modeling  and  analysis  can  handle  socio-technical  problems  with 
complex,  dynamic,  and  interrelated  parts,  such  as  natural  disaster  response  and  disease 
outbreak  response,  which  occur  within  a  context  constrained  by  social,  organizational, 
geographical,  regulatory,  financial,  and  other  factors.  It  can  handle  the  emergence  of 
social  patterns  from  individual  interactions.  Modeling  a  person  as  an  agent  and  social 
relationships  as  networks  is  part  of  computational  modeling.  The  former  takes  the  form  of 
multi-agent  models  (Weiss  1999,  Lucena  et  al.  2004,  Nickles  et  al.  2004,  Dastani  et  al. 
2004);  the  latter  takes  the  form  of  social  network  analysis  (Wasserman  and  Faust  1994). 
A  related  modeling  field  is  Artificial  Life  (Capcarrere  et  al.  2005),  which  deals  with  the 
processes  of  life  and  how  to  better  understand  them  by  simulating  them  with  computers. 

The  use  of  computational  modeling  and  analysis  has  increased  rapidly.  However, 
the  implicit  assumptions  and  abstractions,  changes  in  reality,  and  human  cognitive 
limitations  make  calibration,  verification,  validation,  and  model-improvement  to  assist 
computational  modeling  and  analysis  difficult  and  error-prone  when  performed  manually. 
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1.1  Modeling,  Simulations,  and  Inference 


Most  emphasis  in  computational  modeling  and  analysis  is  on  employing  computers  in 
building  model  specifications,  verifying  the  code,  and  executing  simulation.  Indeed,  the 
notion  of  computational  modeling  and  analysis  usually  means  quantitative  models  run  on 
computers  and  inference/analysis  done  by  human  experts  on  the  results  of  the  computer 
runs.  Much  less  emphasis  is  given  to  employing  computers  to  help  automate  the 
inference,  validation,  model  improvement,  and  experiment  control.  Figure  1  depicts  this 
imbalance  of  automation,  which  this  dissertation  addresses.  In  the  figure,  the  dash-lined 
box  delineates  the  focus  of  this  dissertation.  Not  shown  is  the  possibility  of  automating 
simulation  control  and  experiment  design. 


Figure  1.  Automation  of  Inference,  Validation,  and  Model  Improvement 

Improved  data  gathering  and  computational  resources  mean  more  detailed 
simulation  models  can  be  built  and  run,  but  deciding  how  best  to  use  the  simulation, 
which  produces  tremendous  amount  of  data,  is  still  being  done  manually.  Indeed,  we  are 
in  the  period  of  data-rich,  inference-poor  environments.  Typically,  simulation  results  are 
designed  solely  for  human  analysis  and  validation  is  provided  by  subject  matter  experts 
judging  that  the  model  “feels  right”  (face  validity).  While  this  may  be  sufficient  for 
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small-scale  simulations,  it  is  inadequate  for  large  high-fidelity  simulations  designed  to 
inform  decision-makers.  Expert  systems  (Durkin  1994)  exist  to  codify  subject  matter 
expert  knowledge,  but  they  are  used  separately  outside  the  field  of  simulations  (Kim 
2005,  National  Researeh  Couneil  2004).  There  is  a  knowledge  acquisition  bottleneck  in 
expert  systems.  Augmenting  knowledge  aequisition  with  inferenee  from  data  is  an  aetive 
area  of  researeh.  A  deeade  or  so  ago  the  eomputational  intraetability  problems  in 
reasoning  with  logie  rendered  knowledge-based  approaeh  unattraetive.  Reeent  researeh 
advanees  in  logie  however  have  started  reversing  this  trend. 

While  granting  that  human  experts  can  be  effieient  and  effeetive,  the  laek  of 
automated  tools  for  analysis,  validation,  and  model  improvement  -  at  least  as  the 
assistant  to  human  experts  -  hinders  speedier  advancement  in  many  fields,  including  the 
soeio-technieal  and  biomedieal  fields.  Reeent  advanees  in  data  mining  have  started  to 
make  automated  analysis  eommon.  A  paradigm  shift  is  needed:  from  foeusing  on  design 
and  speeifieation  toward  validation  and  model-improvement.  (Validation  ean  be  thought 
of  as  a  bootstrap  proeess  for  model-improvement.)  Instead  of  focusing  on  the  seienee  of 
design^,  a  more  fruitful  focus  might  be  on  the  seienee  of  simulated  experiments,  whieh  is 
to  say,  on  the  experimental  approaeh  (Edmonds  and  Bryson  2004). 

Eormal  method  (Dershowitz  2004,  Etessami  and  Rajamani  2005)  is  an  alternative 
to  doing  simulations  or  testing.  A  formal  method  provides  a  formal  language  for 
deseribing  a  software  artifaet  (e.g.  speeifieations,  designs,  souree  eode)  such  that  formal 
proofs  are  possible,  in  prineiple,  about  properties  of  the  artifact.  It  is  used  for 
speeifioation,  development,  verification,  theorem  proving,  and  model  eheeking.  Eormal 
method  has  had  sueeesses  in  verifieation  of  software  and  hardware  systems.  The 
verifieation  of  the  AMD-K5  floating  point  square  root  mieroeode  is  one  example.  While 
formal  method  has  been  sueeessfully  used  to  produee  ultra-reliable  safety-eritieal 
systems,  it  is  not  sealable  to  handle  large  and  eomplex  systems.  Most  importantly,  due  to 
its  logical  closed-world  and  mathematieal/logioal  formality  requirements,  formal  method 
eannot  be  used  for  validation. 


http://www.cs.virginia.edu/~sullivaii/sdsis 


4 


1.2  The  Approach 


This  dissertation  describes  a  knowledge-based  and  ontological  approach  for  doing 
validation  of  simulation  systems,  implemented  in  a  tool  called  WIZER  (What-If 
AnalyZER).  The  approach  allows  the  modeling  of  knowledge,  the  control  of  simulation, 
the  inferences  based  on  knowledge  and  simulation,  and  systematic  knowledge-based 
probes  and  adjustments  of  the  parameter,  model,  and  meta-model  spaces  for  validation. 

WIZER  handles  calibration,  verification,  and  validation  for  simulation  systems, 
with  a  side  effect  of  facilitating  a  rudimentary  model-improvement.  Calibration  is  part  of 
validation  and  validation  forms  a  basis  for  model-improvement.  Key  features  of  WIZER 
are  the  simulation  data  descriptor,  the  data  matcher  (which  matches  simulation  data 
descriptions  against  empirical  data),  the  inference  engine,  the  simulation  knowledge 
space,  and  the  empirical  knowledge  space.  Included  in  the  inference  engine  is  a 
parameter  value  modifier.  The  data  descriptor  and  data  matcher  form  a  component  of 
WIZER  called  Alert  WIZER,  which  produces  symbolic/semantic  categorizations  of  data 
and  of  data  comparison.  Statistical  routines  are  employed  in  the  data  descriptor  and  data 
matcher.  The  inference  engine  employs  rule-based,  causal,  and  ontological  reasoning. 

WIZER  is  able  to  reduce  the  number  of  searches  that  need  to  be  performed  to 
calibrate  a  model,  improve  the  focus  of  these  searches,  and  thereby  facilitate  validation. 
Validation  is  achieved  by  performing  knowledge-based  search  in  parameter  and  model 
spaces.  Model-improvement  is  achieved  by  performing  search  in  meta-model  space,  after 
the  comparison  of  simulation  model  and  knowledge  against  target/empirical  knowledge. 
Knowledge-based  hypothesis  building  and  testing  is  employed  to  help  reduce  the  amount 
of  search. 

One  of  the  currently  active  areas  of  research  in  Artificial  Intelligence  is  in 
integrating  deductive  logic  (including  propositional  logic  and  first-order  logic)  and 
probabilistic  reasoning.  The  brittleness  of  first-order  (symbolic)  logic  has  caused  the 
popularity  of  statistics  -  particularly  Bayesian  statistics  -  as  the  preferred  Artificial 
Intelligence  method.  Indeed,  Bayes  rule  forms  the  core  of  probabilistic  algorithms  (Thrun 
et  al.  2005)  behind  the  Stanley  driverless  car  that  traversed  132  miles  of  Southwest  desert 
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and  won  the  2005  DARPA  Grand  Challenge.  The  statistieal  approaeh,  however,  has  an 
inherent  weakness  of  being  unable  to  support  the  struetures  of  domain  knowledge  and  the 
fertile  inferenees  of  logie.  Behind  the  winning  probabilistie  algorithm  of  Stanley,  there 
was  a  eritieal  logieal  inferenee  that  the  short  range  laser  vision  should  be  used  to  train  the 
longer  range  eamera  vision.  The  belief  driving  logie  and  probabilistie  integrative  researeh 
(probabilistie  logie)  in  Artifieial  Intelligenee  is  that  logie  and  probability  are  suffieient  for 
representing  the  real  world.  The  approaeh  underlying  WIZER  indieates  what  is  missing 
in  this  view:  the  importanee  of  modeling  and  simulation,  the  signifieanee  of  hypothesis 
building  and  testing,  and  the  need  to  foeus  on  natural  proeesses  instead  of  just  pure  logie. 
WIZER  eombines  the  power  of  logie,  the  expressiveness  of  model  and  simulation,  and 
the  robustness  of  statisties.  In  addition  to  mathematies,  simulation  is  a  tool  eapable  for 
representing  proeesses  with  high  fidelity.  Intertwining  previously  separate  simulation  and 
knowledge  inferenee,  the  foree  behind  WIZER,  shows  a  way  to  have  validated  simulation 
that  is  eapable  for  representing  proeesses  with  high  fidelity  with  knowledge  inferenee 
(and  explanation)  eapability. 

Changing  part  of  the  strueture  of  soeial  and  agent-based  simulations  may  fit  into 
the  verifieation  problem  if  we  have  either  a  eomplete  logieally-elean  eoneeptual  model  or 
logieally-elean  eoneeptual  models  against  whieh  the  simulation  ean  be  eompared.  (An 
ineomplete  model  does  not  meet  the  elosed  world  requirements  of  logieal  systems.)  If  we 
eompare  the  simulation  against  the  empirieal/domain  data  and  knowledge,  however, 
ehanging  the  simulation  beeomes  part  of  the  validation  proeess.  This  is  an  important 
distinetion.  Depending  on  the  nature  of  data,  ehanging  the  simulation  model  ean  be  part 
of  verifieation  or  validation.  If  the  empirieal  data  is  logieal  and  eomputational  (this  is  rare 
in  the  real  world,  exeept  for  some  engineering  and  seientifie  fields  sueh  as  eleetronie 
engineering)  sueh  that  logieally-elean  eoneeptual  model  ean  be  eonstrueted  from  and 
verified  against  it,  the  ehanging  of  simulation  model  is  part  of  the  verifieation  proeess. 
Eormal  methods  ean  be  used  for  this  verifieation  proeess.  If  the  empirieal  data  is 
noneomputational  or  not  logieally-elean,  whieh  is  the  ease  for  soeial  seienees,  the 
ehanging  of  simulation  model  beeomes  part  of  the  validation  proeess  as  it  must  be 
eompared  against  empirieal  data  and  knowledge  in  addition  to  the  eoneeptual  model  (the 
eoneeptual  model  itself  must  be  empirieal  and  not  neeessarily  logieal).  The  Alert  WIZER 
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can  be  used  to  pinpoint  part  of  the  simulation  model  that  must  be  ehanged  given 
empirieal  evidenee.  If  the  Alert  WIZER  eannot  match  the  parameters  without  ehanging 
the  model,  it  can  show  the  mismatehed  parameters  as  the  starting  point  for  model  change. 
For  example,  in  the  BioWar  simulator  (Carley  et  al.  2003),  if  the  influenza  incidenee 
curve  matehes  the  empirieal  eurve  well,  but  the  number  of  influenza  strains  greatly 
exeeeds  that  of  the  empirieal  reality,  then  the  Alert  WIZER  will  show  that  there  is  a 
potential  model  error  related  to  the  number  of  influenza  strains.  This  is  part  of  validation 
and  model  improvement.  WIZER  ean  be  used  in  many  ways:  for  validation,  for 
pinpointing  model  diserepaneies,  for  semantie  eategorization  of  data,  and  for  model 
improvement. 
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1.3  Contributions 


This  dissertation  provides  a  new  conceptualization  for  how  to  do  automated  validation  of 
simulations  particularly  agent-based  simulations,  and  then  also  implements  a  tool  WIZER 
that  is  consistent  with  this  conceptualization.  The  conceptualization  is  based  on 
knowledge-based  and  ontological  approach  and  it  sheds  light  on  the  relationships 
between  simulation  code,  process  logic,  causal  logic,  conceptual  model,  ontology,  and 
empirical  data  and  knowledge.  The  tool  WIZER  is  implemented  in  four  parts:  the  Alert 
WIZER,  the  Inference  Engine,  the  Simulation  Knowledge  Space,  and  the  Domain 
Knowledge  Space.  The  Alert  WIZER  can  do  semantic  categorizations  of  simulation  data 
and  of  the  comparisons  between  simulation  and  empirical  data,  with  the  support  of 
statistical  tools  it  semantically  controls.  Using  the  semantic  categories  produced  by  the 
Alert  WIZER,  the  Inference  Engine  can  perform  causal,  “if-then”,  and  ontological 
reasoning,  and  determine  new  parameter  values  best  judged  to  move  the  simulation  closer 
to  validity.  This  thesis  has  several  knowledge-based  measures  of  validity.  The  Simulation 
Knowledge  Space  and  the  Domain  Knowledge  Space  support  the  explicit  encoding  and 
computer  processing  of  simulation  and  domain  knowledge,  respectively,  in  the  form  of 
causal  rules,  “if-then”  rules,  and  ontology.  They  also  assist  the  determination  of  new 
parameter  values  by  the  Inference  Engine.  Several  validation  scenarios  done  on  two 
simulation  models,  BioWar  and  CONSTRUCT,  indicate  the  feasibility  and  applicability 
of  WIZER  for  automated  validation  of  simulations. 

In  a  nutshell,  the  contributions  of  this  dissertation  are: 

1 .  A  novel  approach  for  doing  validation  of  simulations.  This  includes  a  knowledge- 
based  and  ontological  method  utilizing  the  inference  engine  and  a  new  method  to 
do  a  simple  hypothesis  formation  and  testing  in  simulations  utilizing 
symbolic/ontological/knowledge-based  information,  instead  of  just  doing 
permutation,  parametric,  and  bootstrap  tests  (Good  2005). 

2.  WIZER,  an  automated  validation  tool  implementing  the  above  knowledge-based 
and  ontological  approach  to  validation.  This  includes  the  Alert  WIZER  which  is 
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capable  of  symbolic  categorizations  of  data  and  of  semantic  control  of  statistical 
routines. 

3.  Showing  that  WIZER  can  reduce  the  amount  of  search  and  focus  the  search, 
utilizing  knowledge-based  and  ontological  reasoning. 

4.  Partially  validated  the  BioWar  and  CONSTRUCT  simulators.  Full  validation  is  a 
major  project  in  its  own  right. 

5.  A  novel  conceptualization  combining  modeling,  simulation,  statistics,  and 
inference  for  a  unified  Artificial  Intelligence  reasoning  construct.  Until  now, 
simulation  was  considered  to  be  separate  from  Artificial  Intelligence.  Logic, 
simulation  (and  thus  processes),  and  probability/statistics  are  intertwined  in  the 
conceptualization.  This  allows  the  brittleness  of  logic  to  be  ameliorated  by 
simulation-mediated  statistical  reasoning.  Furthermore,  this  lets  the  knowledge¬ 
less  statistical  reasoning  to  be  grounded  in  simulation  model/structure. 

6.  A  novel  knowledge-based  and  ontology-based  augmentation  to  simulation.  This 
enables  inference  and  control  of  simulation,  including  those  of  simulation 
statistical  tools.  Knowledge  management  and  strategic  planning  in  organizations 
and  businesses  can  be  enhanced  by  knowledge-augmented  and  validated 
simulations. 

7.  A  novel  description  logic  and  ontology  reasoning  for  simulations,  which  I  call 
Simulation  Description  Logic  (SDL).  This  is  inspired  by  ontology  and  inference 
language  DAML+OIL,  RDF,  and  RuleML.  SDL  allows  the  descriptions  of 
simulation  models,  simulation  results,  and  statistical  tools  used  to  analyze  the 
results.  Based  on  the  descriptions,  the  knowledge  inference  is  performed.  SDL 
paves  a  way  toward  the  Simulation  Web. 

This  dissertation  touches  upon  a  central  problem  in  many  fields  of  research  and 
application  -  how  to  build  models,  do  simulation,  do  model  verification  and  validation, 
perform  inferences,  and  improve  on  them.  As  a  result,  there  are  a  number  of  audiences 
that  can  benefit  from  the  work  herein,  including: 
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Simulation  Modelers.  The  field  of  modeling  and  simulation  eonventionally  regards  the 
inference  or  analysis  work  as  the  domain  of  human  experts  with  minimal 
assistance  from  computer  tools.  Normally  only  statistical  packages  and  data 
mining  tools  are  used  to  assist  human  experts.  WIZER  provides  an  automated  tool 
to  do  knowledge-based  and  ontological  reasoning  for  validation.  As  a  side  effect, 
model  improvement  is  facilitated  by  WIZER  through  a  simple  knowledge-based 
and  ontological  hypothesis  formation  and  testing.  WIZER  thus  adds  a  reasoning 
capable  tool  to  the  repertoire  of  modeler  tools.  In  short,  WIZER  adds  the 
automated  inference  component  to  the  modeling,  simulation,  and  human  analysis. 

Policy  Designers.  The  integration  of  simulation  and  inference  advocated  by  this 
dissertation  allows  the  simulation  and  inference  of  many  policy  problems.  Current 
policy  deliberations  use  math  models  (economic  models  are  popular)  and  simple 
simulations.  Most  policy  designs  are  based  on  meticulous  examinations  of  the 
nature  of  the  problem,  issues,  options,  and  cost/benefit  of  options  by  human 
policy  experts.  Validated  simulations  serving  as  an  important  tool  of  policy  are 
uncommon.  WIZER  provides  a  means  to  automate  simulation  validation,  thus 
making  them  more  common.  Validated  simulations  with  coupled  knowledge 
bases  and  inference  would  help  greatly  in  the  integrative  treatment  of  the  multiple 
aspects  of  a  problem.  By  the  virtue  of  its  knowledge  and  ontological  inferences, 
WIZER  assists  in  this  regard  too. 

Computer  Scientists.  The  field  of  Computer  Science  is  transitioning  towards  handling 
more  real  world  problems.  As  a  result,  domain  knowledge  from  other  fields 
including  physics,  biology,  sociology,  and  ecology  is  becoming  more  important. 
The  reasoning  algorithms  in  Computer  Science  and  Artificial  Intelligence  must 
evolve  as  more  interdisciplinary  challenges  are  encountered.  No  longer  is  it 
sufficient  to  use  simple  Bayesian  reasoning  with  its  conditional  dependence 
assumption  of  the  known  information.  Now  it  is  necessary  to  incorporate  domain 
knowledge  via  more  sophisticated  reasoning  algorithms.  It  is  becoming  crucial  to 
be  able  to  represent  real  world  processes.  Representing  real  world  processes  -  and 
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cause-effect  relations  -  is  doable  by  simulations,  in  addition  to  by  mathematics. 
WIZER  can  validate  such  simulations  and  integrate  knowledge  inference  and 
simulation.  It  makes  domain  knowledge  and  simulation  knowledge  explicit  and 
operable,  which  is  to  say,  suitable  for  automated  or  computer  processing. 

Epidemiologists.  As  the  field  of  epidemiology  considers  spatial  and  sociological  aspects 
of  disease  spreads,  it  is  inevitable  that  more  sophisticated  and  complex  models 
upon  which  epidemiologists  can  rely  on  to  compute  and  predict  the  spread  of 
diseases  will  appear.  Spatial  epidemiology  is  now  relatively  mature  field,  but 
“social”  epidemiology  is  not.  This  dissertation  brings  forward  an  automated 
validation  of  a  multi-agent  social-network  model  of  disease  spread  called  BioWar. 
A  multi-agent  social-network  model  is  an  appropriate  tool  for  modeling  social 
interactions  and  phenomena.  In  BioWar,  it  is  shown  that  anthrax  and  smallpox 
can  be  simulated  agent-by-agent  and  the  resultant  population  behavior  and  disease 
manifestations  mimic  those  of  the  conventional  Susceptible-Infected-Recovered 
(SIR)  model  of  disease  spread.  WIZER,  the  automated  validation  tool  of 
simulations,  allows  epidemiologists  to  build,  validate,  and  use  more  complex 
model  of  disease  spread  that  takes  into  account  social,  geographical,  financial,  and 
other  factors.  It  helps  make  prognosis,  planning,  and  response  more  accurate,  thus 
saving  lives. 


Social  Scientists.  Multi-agent  modeling  and  simulation  is  becoming  a  preferred  tool  to 
examine  social  complexity.  The  software  to  do  meaningful  social  inquiry  is 
usually  complex,  due  to  the  social  interactions  and  the  emergence  of  social 
patterns.  WIZER  provides  the  automation  tool  for  the  validation  of  social 
software,  particularly  the  multi-agent  social-network  software. 
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1.4  Outline 


This  thesis  research  is  presented  in  thirteen  chapters,  organized  by  three  parts:  1) 
conceptualization  and  theoretical  justification,  2)  implementation,  experiments,  and 
results,  and  3)  discussion  and  future  work. 

Chapter  1  introduces  the  reader  to  the  background,  the  rationale,  the  approach,  and  the 
contributions  of  this  research. 

Chapter  2  contains  descriptions  about  related  work  in  validation  and  model- 
improvement. 

Chapter  3  contains  descriptions  about  inference  techniques  in  artificial  intelligence  and 
scientific  method,  shows  the  need  for  a  new  inference.  Empirical  reasoning  and 
knowledge-based  hypothesis  building  and  testing  are  shown  as  a  good  choice  for  a  new 
inference  mechanism. 

Chapter  4  contains  the  description  of  WIZER.  This  includes  the  description  of  Alert 
WIZER,  the  Inference  Engine,  and  the  knowledge  spaces.  It  also  describes  in  detail  the 
reasoning  mechanisms  in  the  Inference  Engine,  which  includes  rule-based  reasoning  and 
hypothesis  formation  and  testing.  It  describes  the  use  of  novel  simulation  description 
logic  to  describe  the  simulation  results  and  the  statistical  tools. 

Chapter  5  explains  the  evaluation  criteria  for  validation  and  model-improvement  along 
with  the  metrics. 

Chapter  6  describes  the  BioWar  testbed,  the  experimental  setup  for  it,  the  runs,  and  the 
results.  BioWar  (Carley  et  al.  2003)  is  a  city-scale  spatial  social  agent  network  model 
capable  of  simulating  the  effects  of  weaponized  biological  attacks  against  the  background 
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of  naturally-occurring  diseases  on  a  demographioally-realistie  population.  Included  is  the 
description  of  empirical  data  used  to  validate  BioWar. 

Chapter  7  deseribes  the  CONSTRUCT  testbed,  its  experimental  setup,  the  runs,  and  the 
results.  CONSTRUCT  (Carley  1991,  Schreiber  and  Carley  2004)  is  a  multi-agent  model 
of  group  and  organizational  behavior,  capturing  the  co-evolution  of  cognition 
(knowledge)  and  structure.  The  empirical  data  used  to  validate  CONSTRUCT  is 
Kapferer’s  Zambia  tailor  shop  data  of  workers  and  management  interaetions. 

Chapter  8  deseribes  the  strengths  and  weaknesses  of  current  WIZER  and  potential 
improvements.  This  includes  a  comparison  between  WIZER  and  Response  Surface 
Methodology  and  a  comparison  between  WIZER  and  the  subjeet  matter  experts 
approaeh.  The  reasoning  meehanisms  in  WIZER  could  be  improved  further.  This  ehapter 
also  describes  how  WIZER  can  work  together  with  existing  tools  in  COS  sueh  as 
AutoMap,  ORA,  and  DyNet. 

Chapter  9  positions  WIZER  and  its  contributions  in  Computer  Scienee  perspectives, 
with  Computer  Science  and  Artificial  Intelligence  terminology. 

Chapter  10  describes  the  relationships  between  causality,  simulation,  and  WIZER.  It 
advances  the  use  of  validated  simulations  as  a  better  way  to  examine  causality  and  to 
perform  eausal  inferenees.  This  chapter  also  contains  the  construction  of  process  logie 
and  ontology  to  describe  processes  and  mechanisms  crucial  for  any  causal  relation. 

Chapter  11  explores  the  potential  extensions  and  implications  of  WIZER.  Eirst,  it  probes 
and  describes  potential  extensions  of  the  work.  These  include:  1)  the  work  toward  the 
realization  of  the  Simulation  Web,  the  potential  next  step  of  the  Semantic  Web,  2)  the 
work  toward  super-simulations,  and  3)  the  work  toward  ereating  knowledge  assistant  and 
knowledge  assisted  communication.  Second,  it  explains  the  potential  implications  of 
WIZER  in  wider  fields,  including  Policy  Analysis  and  Design,  Organization  and 
Management,  Biomedical  Informatics,  and  Bioinformatics/Computational  Biology.  In 
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particular,  WIZER  enhances  knowledge  management  in  many  fields  with  validation 
simulation  enabled  by  its  validation  automation  eapability. 

Chapter  12  contains  the  deseription  of  WIZER  code  and  a  guide  for  the  configuration 
and  use  of  WIZER. 

Chapter  13  summarizes  the  contributions,  limitations,  and  potential  extensions  to  this 
researeh. 

Appendix  A  describes  the  field  of  modeling  and  simulation,  eonventional  simulation 
approaches,  and  shows  what  and  how  WIZER  contributes  to  the  field.  It  also  deseribes 
how  simulation  models  can  be  learned  from  data. 

Appendix  B  shows  how  WIZER  can  augment  system  dynamics  by  knowledge 
representation,  inferenee,  and  eontrol  of  system  dynamics  models. 

Appendix  C  contains  the  ontology  and  knowledge  base  for  the  BioWar  simulator. 

Appendix  D  has  the  ontology  and  knowledge  base  for  the  CONSTRUCT  simulator. 
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1.5  Definition  of  Terms 


The  following  are  the  definition  of  terms  related  to  this  researeh. 

Verification;  a  set  of  teehniques  for  determining  whether  the  programming 
implementation  of  the  abstraet  or  eoneeptual  model  is  eorreet  (Xiaorong  2005). 

Validation:  a  set  of  teehniques  for  determining  whether  the  eoneeptual  model  is  a 
reasonably  aeeurate  representation  of  the  real  world  (Xiaorong  2005).  Model 
validation  is  aehieved  through  the  ealibration  of  the  model  until  model  aeeuraey  is 
aeeep  table. 

Calibration;  an  iterative  proeess  of  adjusting  unmeasured  or  poorly  eharaeterized  model 
parameters  or  models  to  improve  the  agreement  with  empirieal  data  (Xiaorong 
2005). 

Accreditation:  a  certification  process  by  an  independent/official  agency  (Bale!  1998) 
which  is  partly  subjective  and  often  includes  not  only  verification  and  validation 
but  items  such  as  management  policy,  documentation,  and  user  interface. 

Training:  procedures  for  supplying  data  and  feedback  to  computational  learning  models 

Model  improvement:  a  set  of  techniques  to  enhance  the  model  relative  to  the  epistemic 
and  empirical  knowledge  of  the  problem  of  interest. 

Unless  mentioned  otherwise,  the  term  validation  in  this  dissertation  will  denote 

calibration,  validation,  and  model-improvement. 
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Chapter  II:  The  Need  for  a  New 
Approach 


Validation  has  been  addressed  using  different  approaehes  from  many  fields.  I  elaborate 
on  these  below  and  point  to  a  promising  new  approaeh  to  the  problem  of  validation. 
Validation  is  not  to  be  eonfused  with  verifieation.  The  latter  deals  with  how  to  build  a 
product  right,  while  the  former  concerns  itself  with  how  to  build  a  right  product  which  is 
a  far  more  important  and  difficult  problem.  Validation  is  also  different  from  diagnosis,  as 
the  former  concerns  itself  to  ascertain  if  a  model  is  correct,  while  the  latter  probes  what 
causes  a  malfunction(s)  in  parts  of  a  model  given  than  the  model  is  correct. 


2.1  Related  Work 


Verification  and  validation  can  theoretically  be  performed  by  utilizing  formal  methods 
(Weiss  1999,  Dershowitz  2004,  Davies  et  al.  2004,  Bertot  and  Casteran  2004,  Hinchey  et 
al.  2005,  Fitzgerald  et  al.  2005)  if  a  formal  specification  of  validity  exists.  A  formal 
method  is  a  method  that  provides  a  formal  language  for  describing  specifications, 
designs,  and  source  code  such  that,  in  principle,  formal  proofs  are  possible.  Formal 
methods  can  be  categorized  into  “traditional”  formal  methods  which  are  used  for  design 
verification  and  algorithm/code  verification,  and  “lightweight”  formal  methods  which  are 
used  for  requirements  “validation”  and  conceptual  model  “validation”,  that  is,  analyzing 
assumption,  logic,  and  structure.  It  is  not  yet  applicable  to  “validation”  at  the  run-time 
level  and  the  empirical  level.  Formal  methods  depend  on  denotational,  operational,  and 
axiomatic  semantics.  The  value  of  formal  methods  is  that  they  provide  a  means  to 
symbolically  examine  the  entire  state  space  and  establish  a  correctness  property  that  is 
true  for  all  possible  inputs.  Formal  methods  can  be  used  for  specification,  development 
and  verification,  and  automated  provers.  Automated  provers  include: 
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o  Automated  theorem  proving,  whieh  produces  a  formal  proof  from  scratch,  given  a 
description  of  the  system,  a  set  of  logical  axioms,  and  a  set  of  inference  rules, 
o  Model  checking,  which  verifies  properties  by  means  of  an  exhaustive  search  of  all 
possible  states  that  could  be  entered  during  execution. 

Neither  of  these  techniques  works  without  human  assistance.  Automated  theorem  provers 
usually  require  human  inputs  as  to  which  properties  to  pursue,  while  model  checkers  have 
the  characteristic  of  getting  into  numerous  uninteresting  states  if  the  model  is  sufficiently 
abstract.  However,  while  formal  methods  have  been  applied  to  verify  safety  critical 
systems,  they  are  currently  not  scalable  to  reasonably  complex  simulations.  In  addition  to 
relying  on  logic  and  automata  (finite  state  machines),  formal  methods  rely  on  specified 
“truths”,  ignoring  the  empirical  nature  of  reality.  They  also  rely  on  a  limited  set  of 
semantics,  ignoring  natural  processes  and  causality.  A  formal  proof  of  correctness,  if 
attainable,  would  seem  to  be  the  most  effective  means  of  model  verification  and 
validation,  but  this  impression  is  wrong.  Indeed,  formal  methods  can  prove  that  an 
implementation  satisfies  a  formal  specification,  but  they  cannot  prove  that  a  formal 
specification  captures  a  user's  intuitive  informal  expectation  and/or  empirical  foundations 
for  a  system.  Furthermore,  non-computational  data  inherent  in  the  validation  process 
cannot  be  properly  handled  by  formal  methods,  which  requires  strict  logical 
representation.  In  other  words,  formal  methods  can  be  used  to  verify  a  system,  but  not  to 
validate  a  system.  The  distinction  is  that  validation  shows  that  a  product  will  satisfy  its 
user-desired  mission,  while  verification  shows  that  each  step  in  the  development  satisfies 
the  requirements  imposed  by  previous  steps.  Contrary  to  intuition,  forcing  formality  on 
informal  application  knowledge  may  in  fact  hinder  the  development  of  good  software. 
Successful  projects  are  often  successful  because  of  the  role  of  one  or  two  key  exceptional 
designers.  These  designers  have  a  deep  understanding  of  the  application  domain  and  can 
map  the  application  requirements  to  software. 

In  software  engineering  (Pressman  2001),  “validation”  of  multi-agent  systems  is 
done  by  code-“validation”,  which  means  the  determination  of  the  correctness  of  the 
software  with  respect  to  the  user  needs  and  requirements.  In  contrast,  my  concern  is  with 
empirical  in  addition  to  epistemic  validation.  In  principle,  if  -  this  is  a  big  if  -  the  real- 
world  problems  could  be  specified  formally,  then  formal  methods  could  be  applied. 
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However,  formal  methods  (Dershowitz  2004,  Davies  et  al.  2004,  Bertot  and  Casteran 
2004,  Hinchey  et  al.  2005,  Fitzgerald  et  al.  2005)  used  in  software  engineering  for  the 
eontrol  and  understanding  of  complex  multi-agent  systems  lack  an  effective  means  of 
determining  if  a  program  fulfills  a  given  formal  specification,  particularly  for  very 
complex  problems  (Edmonds  and  Bryson  2004).  Societal  problems  include  complex 
communication  patterns  (Monge  and  Contractor  2003),  messy  interactions,  dynamic 
processes,  and  emergent  behaviors,  and  thus  are  so  complex  that  applying  requirements 
engineering  and/or  formal  methods  is  currently  problematic.  Still,  formal  methods  have 
value  in  requirements  “validation”,  not  least  by  its  virtue  of  precise  specification,  which 
could  reveal  ambiguities  and  omissions  and  improve  communications  between  software 
engineers  and  stakeholders. 

Evolutionary  verification  and  validation  or  EVV  (Shervais  et  al.  2004,  Shervais 
and  Wakeland  2003)  can  be  also  applied  to  multi-agent  social-network  systems.  EVV 
utilizes  evolutionary  algorithms,  including  genetic  algorithms  (Deb  et  al.  2004)  and 
scatter  search,  for  verification  and  validation.  While  EVV  allows  testing  and  exploitation 
of  unusual  combinations  of  parameter  values  via  evolutionary  processes,  it  employs 
knowledge-poor  genetic  and  evolutionary  operators  rather  than  the  scientific  method,  for 
doing  experiments,  forming  and  testing  hypotheses,  refining  models,  and  inference, 
precluding  non-evolutionary  solutions  and  revolutionary  search/inference  steps. 

Docking  -  the  alignment  of  possibly-different  simulation  models  -  is  another 
approach  to  validating  multi-agent  systems  (Axtell  et  al.  1996).  Alignment  is  used  to 
determine  whether  two  simulation  models  can  produce  the  same  results,  which  in  turn  is 
the  basis  for  experiments  and  tests  of  whether  one  model  can  subsume  another.  The  more 
models  align,  the  more  they  are  assumed  to  be  valid,  especially  if  one  (or  both)  of  them 
has  been  previously  validated.  The  challenges  in  applying  docking  are  the  limited  number 
of  previously  validated  models,  the  implicit  and  diverse  assumptions  incorporated  into 
models,  and  the  differences  in  data  and  domains  among  models.  Two  successful 
examples  of  docking  are  the  alignment  of  the  anthrax  simulation  of  BioWar  against  the 
Incubation-Prodromal-Eulminant  (IPE)  mathematical  model,  a  variant  of  the  well-known 
Susceptible-Infected-Recovered  (SIR)  epidemiological  model  (Chen  et  al.  2006),  and  the 
alignment  of  BioWar  against  an  SIR  model  of  smallpox  (Chen  et  al.  2004).  While 
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aligning  a  multi-agent  model  with  a  mathematieal  model  ean  show  the  differenees  and 
similarities  between  these  two  models,  the  validity  it  provides  is  limited  by  the  type  and 
granularity  of  data  the  mathematieal  model  uses  and  by  the  faet  that  symbolie  (non- 
numerieal)  knowledge  is  not  usually  taken  into  eonsideration. 

Validating  multi-agent  soeial-network  simulations  by  statistieal  methods  alone 
(Jewell  2003)  is  problematie  beeause  the  granularity  required  for  the  statistieal  methods 
to  operate  properly  is  at  a  sample  population  level  and  the  sample  has  homogeneity 
assumptions.  Mueh  higher  granularity  and  heterogeneity  ean  be  aehieved  using 
knowledge-based  validation.  Statisties  averages  over  individuals.  Individual  importanee 
and  eeeentrieity  hold  little  meanings  for  a  population  from  the  statistieal  point  of  view. 
Moreover,  statistieal  methods  eannot  usually  deal  with  symbolie  -  instead  of  numerie  - 
data  and  eause-and-effeet  relationships. 

Human  subjeet  matter  experts  (SMEs)  ean  validate  eomputational  models  by 
foeusing  on  the  most  relevant  part  of  the  problem  and  thinking  about  the  problem 
intuitively  and  ereatively.  Applying  learned  expertise  and  intuition,  SMEs  ean  exploit 
hunehes  and  insights,  form  rules,  judge  patterns,  analyze  polieies,  and  assess  the  extent  to 
whieh  the  model  and  their  judgments  align.  To  deal  with  large-seale  simulations,  SMEs’ 
effeetiveness  ean  be  enhaneed  with  eomputer  help.  Managed  and  administered  properly, 
SMEs  ean  be  effeetive.  The  Arehimedes  model  of  diabetes  is  an  example  of  sueeessful 
validation  by  SMEs  assisted  by  statistieal  tools  (Eddy  and  Sehlessinger  2003).  However, 
human  judgment  based  validation  is  subjeet  to  pitfalls  sueh  as  bounded  rationality,  biases, 
implieit  reasoning  steps,  and  judgment  errors.  Moreover,  the  faet  that  validation 
knowledge  is  often  not  explieitly  stated  and  eneoded  hinders  the  validation  proeess. 
When  SMEs  evaluate  the  results  of  the  ehanges  they  suggested  earlier,  some  results  may 
be  wrong.  Pinpointing  exaetly  where  in  the  proeess  the  error  oeeurs  is  diffieult  due  to  the 
above  implieit  assumptions  and  sometimes  ambiguous  statements.  Even  if  the  validation 
knowledge  is  explieit,  it  is  not  struetured  and  eodified  for  automation  by  eomputer. 

Another  approaeh  to  validation  is  direet  validation  with  real  world  data  (empirieal 
validation)  and  knowledge  (epistemie  validation).  Validation  ean  be  viewed  as 
experimentation  with  data  and  knowledge,  and  models  as  infrastrueture  or  lab  equipment 
for  doing  eomputational  experiments  or  simulations  (Bankes  2004).  Simulation  (Eaw  and 
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Kelton  2000,  Rasmussen  and  Barrett  1995)  has  an  advantage  over  statistics  and  formal 
systems  as  it  can  model  the  world  as  closely  as  possible  (e.g.,  modeling  emergence),  free 
of  the  artifacts  of  statistics  and  formal  systems.  Direct  validation  requires  a  number  of 
virtual  experiments  be  run  using  the  simulator.  The  results  from  these  experiments  are 
then  compared  with  the  real  data.  Two  techniques  for  this  comparison  are  Response 
Surface  Methodology  (Myers  and  Montgomery  2002)  and  Monte  Carlo  simulations 
(Robert  and  Casella  1999).  These  two  approaches,  however,  can  only  be  used  for 
numerical  data  and  are  limited  to  a  small  number  of  dimensions. 

An  interesting  and  somewhat  related  work  is  the  extension  of  C++  language  with 
a  programming  construct  for  rules  called  R++  (Crawford,  et  al.  1996).  A  rule  is  a 
statement  composed  of  a  condition,  the  left-hand  side  (LHS),  and  an  action,  the  right- 
hand  side  (RHS),  that  specifies  what  to  do  when  the  condition  becomes  true.  R++  rules 
are  path-based,  which  means  the  rules  are  restricted  to  the  existing  object-oriented 
relationships,  unlike  data-driven  rules.  R++  however  is  not  available  in  the  public 
domain.  On  June  16,  1998,  Patent  Number  5768480  (“Integrating  Rules  into  Object- 
Oriented  Programming  Systems”)  was  issued  to  Lucent  Technologies  for  R++,  but  the 
production  version  of  R++  is  owned  by  AT&T.  The  legal  complications  of  figuring  out 
who  owns  R++  and  licensing  issues  due  to  AT&T  and  Lucent  breakup  has  meant  that 
R++  is  not  available  commercially  or  through  free  distribution. 

Diagnosis  is  another  somewhat  related  technique.  Diagnosis  is  concerned  with 
ensuring  a  product  works  correctly.  The  frame  of  thought  for  diagnosis  is  finding  the 
causes  of  symptoms  in  the  model,  assuming  that  the  model  is  correct  or  several 
alternative  candidate  models  are  correct.  Diagnosis  does  not  deal  with  the  validation  of 
models.  It  mostly  focuses  on  heuristic  inference,  except  for  model-based  diagnosis.  The 
models  and  the  processes  are  not  examined  to  see  if  they  are  valid  empirically  (they  are 
assumed  and  given  to  be  valid  a  priori  in  model-based  diagnosis).  Diagnosis  is  usually 
done  for  illness,  mechanical  malfunctions,  and  software  failure.  Tools  used  for  diagnosis 
include  expert  systems  (Jackson  1999)  and  Bayesian  networks. 

One  of  subject  matter  experts’  approaches  to  validation,  the  Verification, 
Validation  and  Accreditation  (VV&A)  method,  is  a  regimented  process  to  ensure  that 

^  http://www.research.att. coin/sw/tools/r++ 
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each  model  and  simulation  and  its  data  are  used  appropriately  for  a  specific  purpose, 
usually  for  military  systems  development  and  acquisition.  W&A  is  conducted 
throughout  the  modeling  and  simulation  life-cycle  management  (LCM)  process.  While 
VV&A  has  proven  to  be  a  successful  approach  for  military  systems,  the  task  of  VV&A  is 
labor  intensive  and  involves  several  organizations.  VV&A  is  done  mainly  by  human 
experts  or  trained  personnel  with  the  help  of  quantitative  tools.  Only  organizations  with 
deep  resources  and  sufficient  time  can  apply  VV&A. 


2.2  Why  Validation  of  Multi-Agent  Social-Network 
Simulations  is  Hard 


All  simulations  are  wrong,  but  some  are  useful.  It  is  currently  impractical  to  have 
simulations  completely  mirror  the  real  world,  except  for  the  cases  where  real  world 
processes  are  well  understood.  Validation  is  usually  performed  against  a  small  part 
and/or  an  abstracted  part  of  the  real  world  which  the  policy  question  at  hand  is  concerned 
with. 

The  task  of  validating  a  simulation  -  and  the  model  behind  it  -  against  that 
portion  of  the  reality  that  the  simulation  needs  to  address  is  hard  due  to  the  often-implicit 
assumptions,  unclear  correspondence,  uncertainty,  compounding,  combinatorial 
explosion  of  the  possible  combinations  of  parameter  values,  the  large  amount  of  time 
needed,  human  cognitive  limitations,  changes  in  the  real  world,  possibly  chaotic  system 
behavior,  interaction,  system  dependence,  the  non-Markovian  nature  of  the  real  world, 
and  emergence  of  patterns  or  behaviors. 

Validating  multi-agent  simulations  is  harder  due  to  the  magnitude  of  interactions 
between  agents,  the  increased  action  choices  of  agents,  knowledge  dimension,  and  causal 
relations.  While  neither  networks  nor  organizations  are  present,  the  combinatorial 
explosion  of  possible  agent-to-agent  interactions  and  actions  makes  validation  difficult. 

Multi-agent  social-network  simulations  are  even  harder  to  validate  because  they 
have  an  additional  social-network  aspect  to  contend  with.  The  social-network  can  have 
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multiple  attributes  sueh  as  friendship,  family  relationships,  work  relationships,  and 
others.  The  relationship  between  agents  forms  dyads  and  triads  with  differing  tie 
strengths.  The  structure  of  the  social  network  itself  gives  rise  to  cliques,  coalitions, 
isolates,  and  others.  The  social  networks  constraint  possible  agent  behaviors,  while  agent 
behaviors  shape  the  social  networks.  These  networks  give  rise  to  organizations. 

Dynamic  multi-agent  social-network  simulations  are  even  harder  to  validate 
because  they  are  dynamic  -  behaviors  and  agents  change  over  time.  Most  multi-agent 
social-network  simulations  are  dynamic. 

In  general,  any  model  that  deals  with  uncertainty  and  dynamics  is  complex  and 
potentially  hard  to  validate.  The  sources  of  uncertainty  include  ignorance,  ambiguities, 
belief/disbelief,  changing  worlds,  and  incorrect  and/or  incomplete  knowledge. 


2.3  Special  Challenges  posed  by  Subject  Areas 


Subject  areas  may  pose  special  challenges  to  the  task  of  validation.  Relevant  subject  areas 
for  this  dissertation  which  pose  special  challenges  are  biomedical  informatics, 
epidemiology,  and  social  science.  There  are  several  kinds  of  special  challenges: 

1.  Data  gathering:  fields  such  as  physics,  chemistry,  and  mechanical  engineering 
have  a  straightforward  data  gathering  procedure.  In  social  sciences  and 
biomedical  science,  the  data  gathering  process  is  more  complicated,  as  it  requires 
informed  consent  and  almost  always  involves  biases. 

2.  Data  quality:  in  physics  it  is  feasible  and  even  routine  to  have  data  with  high 
accuracy.  In  sociology,  when  data  is  gathered  by  using  surveys,  accuracy  is  not 
high.  In  medical  science,  some  data  have  good  accuracy  (such  as  the  genomic 
data),  while  others  do  not  (such  as  the  effectiveness  of  certain  treatments). 

3.  Data  quantity:  it  is  generally  harder  and  more  costly  to  get  data  in  large  quantity 
for  sociology  and  organization  science  than  for  physical  sciences. 
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4.  Process  clarity:  in  physics  the  processes  are  usually  precisely  defined  and 
understood.  Many  processes  in  social  sciences,  organizational  science,  and 
medical  science  are  not  as  precisely  defined  and  understood. 

In  this  dissertation,  validation  of  the  BioWar  testbed  presents  the  following 
challenges: 

1.  Center  of  Disease  Control  (CDC)  data  about  influenza  is  not  precise.  This  is  due 
to  the  fact  that  there  is  no  precise  knowledge  about  influenza  manifestations.  We 
know  a  lot  about  the  influenza  virus,  but  do  not  know  the  infection  rate  and  the 
death  rate  of  influenza  for  an  individual.  This  is  due  to  the  complexity  of  the 
human  body.  We  do  not  know  when  precisely  influenza  symptoms  will  manifest 
for  each  individual. 

2.  The  incidence  rate  of  influenza  is  not  known  to  a  high  degree  of  accuracy.  Due  to 
the  nature  of  influenza  spread,  the  smaller  the  sample  area,  the  less  reliable  are  the 
statistics. 

Validation  of  the  CONSTRUCT  testbed  has  the  following  challenges: 

1.  Limited  data:  the  Kapferer’s  Zambia  tailor  shop  data  encodes  the  social  and 
instrumental  ties  extensively,  but  not  the  deep  knowledge  dynamics  of  each 
individual.  All  facts  are  encoded  to  be  the  same,  that  is,  if  a  fact  is  known  it  is 
encoded  as  1,  otherwise  as  0.  In  reality,  not  all  facts  are  the  same.  A  fact  may 
have  correlation  and  compounding  with  another  due  to  the  semantics  of  the  facts. 

2.  Imprecise  data:  the  tailor  shop  data  abstracts  the  societal  and  instrumental  ties. 
How  strong  the  ties  really  are  is  not  given.  While  granting  that  social  survey  is  not 
an  easy  task,  this  remains  a  hindrance  to  knowledge-based  inference. 

3.  Non-repeatability  of  social  situations:  while  WIZER  can  play  out  what-if 
scenarios,  the  hypothetical  scenarios  do  not  have  the  corresponding  empirical  data 
as  there  was  no  exactly  corresponding  social  situation.  In  general,  no  social 
situations  happen  twice  with  exact  precision,  unlike  physics  experiments.  History 
may  repeat,  but  not  exactly. 
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2.4  Validation  and  Policy  Question 


As  the  validity  of  simulation  is  measured  against  the  poliey  question  the  simulation  is 
designed  for,  the  types  and  extent  of  knowledge  need  to  be  elarified,  as  follows: 

o  The  reality  of  the  universe.  Part  of  the  reality  is  basieally  known  but  the 
totality  of  it  is  still  unknown  despite  the  advaneement  of  seienee  sinee  the 
Renaissanee. 

o  The  global  knowledge  of  human  beings.  The  knowledge  may  or  may  not  be 
true  with  respeet  to  the  reality  of  the  universe, 
o  The  knowledge  relevant  to  the  poliey  question  at  hand.  This  knowledge  is  a 
subset  of  the  global  knowledge.  This  ineludes  domain/epistemic  knowledge 
and  empirieal  knowledge. 

o  The  knowledge  embodied  in  simulation  models  and  simulation  exeeutions. 
This  knowledge  may  or  may  not  be  a  subset  of  the  poliey  question’s 
knowledge. 

Figure  2  illustrates  the  seopes  of  the  knowledge  spaees. 
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Real  world 


Figure  2.  Knowledge  Spaces 

The  task  of  validation  is  defined  as  the  proeess  of  fitting  the  simulator’s 
knowledge  spaee  into  the  poliey  question’s  knowledge  spaee.  The  simulator’s  knowledge 
spaee  includes  the  static  knowledge  behind  model  specification  and  implementation,  and 
the  dynamic  knowledge  arising  from  the  execution  of  the  simulator. 

The  policy  question’s  knowledge  space  is  defined  as  all  relevant  knowledge 
pertaining  to  the  policy  question.  For  example,  in  a  biological  attack,  the  policy  question 
may  be  what  the  most  effective  response  is,  while  the  knowledge  enabling  this  question 
to  be  answered  forms  the  knowledge  space  of  the  policy  question.  Needless  to  say,  this 
knowledge  space  is  much  larger  than  the  semantics  of  the  policy  question. 
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The  policy  question’s  knowledge  space  is  contained  within  the  global  knowledge 
space.  The  global  knowledge  space,  which  includes  physical  and  psychological 
knowledge,  may  not  necessarily  capture  the  real  world. 

Empirical  data  is  assumed  to  be  part  of  the  global  knowledge  space.  In  actuality, 
empirical  data  functions  as  an  extender  of  the  global  knowledge  space  to  be  closer  to 
reality.  For  clarity,  I  chose  not  to  draw  another  circle  denoting  the  empirical  data  space 
which  intersects  the  global  knowledge  space. 

As  simulations  are  basically  knowledge  systems,  the  knowledge-based  approach 
enables  the  control  and  validation  of  simulations  directly  with  empirical  knowledge  and 
data.  The  knowledge-based  approach  has  a  representation  in  the  form  of  knowledge 
space,  a  generalization  of  version  space  (Mitchell  1978). 

A  positive  aspect  of  having  to  answer  a  policy  question  is  that  the  policy  question 
can  be  used  to  restrict  what  validation  needs  to  be  performed  on  a  simulation  system.  If 
the  policy  question,  for  example,  is  concerned  with  school  absenteeism,  then  the 
validation  task  is  made  to  focus  on  school  absenteeism,  not  on  other  data  streams  such  as 
work  absenteeism.  While  the  two  may  be  correlated,  validation  of  school  absenteeism 
does  not  need  work  absenteeism  data,  except  in  the  case  that  we  want  to  model  what 
effects  non-working  parents  have  on  children’s  school  absenteeism  rate.  Having  to 
answer  a  policy  question  reduces  the  search  space  and  the  amount  of  inferences  that  need 
to  be  done. 


2.5  Mathematical  Reasoning  Automation 


An  integral  part  of  any  reasoning  system  is  mathematical  reasoning.  Numerical 
computations  are  the  domain  of  computers,  which  can  perform  them  effortlessly  and 
speedily.  Symbolic  math  computation  is  also  doable  by  software.  Both  numerical  and 
symbolic  computation  is  an  achievement,  but  the  most  exciting  area  is  automated 
reasoning.  Automated  reasoning  for  math  and  logic  (Hutter  and  Stephan  2005), 
particularly  for  theorem  proving  and  proof  finding,  has  progressed  significantly  to  the 
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point  that  in  1996  artificial  mathematicians  EQP  and  Otter  proved  a  eonjeeture  of  the  so- 
ealled  Robbins  problem,  a  eonjeeture  whieh  was  open  for  sixty  years,  unsettled  by  the 
best  human  mathematieians.  The  progress  of  math  and  logie  automated  reasoning 
provides  hope  for  future  realization  of  a  program  that  understands  math  (and  not  just 
manipulating  bits,  numbers,  and  symbols  mindlessly). 

1  mention  the  automated  reasoning  for  math  and  logie  here  to  illustrate  the  power 
and  potential  of  automated  reasoning.  While  this  dissertation  does  not  eover  this  aspeet  of 
automated  reasoning,  there  is  a  potential  for  eross-fertilization  in  the  future. 


2.6  Causal  Analysis,  Logic,  and  Simulation 


Statistieal  analysis  is  routinely  employed  to  help  experts  perform  validation.  Statistieal 
analysis  ean  infer  parameters  of  a  distribution  from  samples.  The  assoeiations  among 
variables  and  the  likelihood  of  events  ean  be  estimated.  Dynamie  environments  however 
entail  ehanging  experimental  eonditions  whieh  make  statistieal  analysis  insuffieient.  For 
example,  the  joint  distribution  of  symptoms  and  diseases  eannot  say  that  euring  the 
former  would  or  would  not  eure  the  latter.  It  eannot  say  how  the  distribution  would  differ 
if  external  eonditions  were  to  ehange. 

In  eontrast  to  statistieal  analysis,  eausal  analysis  -  whieh  ean  infer  aspeets  of  data 
generation  proeess  -  ean  deal  with  dynamie  ehanges.  Here  simulation  plays  an  important 
role,  by  quasi-experimenting  the  data  generation  proeess.  This  enables  the  deduetion  of 
not  only  the  likelihood  of  events  under  statie  eonditions,  but  also  the  dynamies  of  events 
under  ehanging  eonditions.  Causal  analysis  and  simulation  enables  the  estimation  of  how 
events  whieh  have  not  happened  yet  will  play  out  (seenario  analysis),  of  intervention 
outeomes,  and  of  what  events  will  most  likely  happen. 

Assoeiational  assumptions  -  sueh  as  Bayesian  eonditional  or  prior  -  ean  be  tested 
and  estimated  in  prineiple  given  a  suffieiently  large  sample.  Causal  assumptions,  on  the 
other  hand,  eannot  be  verified  even  in  prineiple,  unless  we  use  experimental  eontrol.  This 
is  where  earefully-designed  simulation  plays  an  important  role.  The  simulation  faeilitates 
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quasi-experiments  whieh,  given  good  enough  empirical  data,  can  reflect  the 
consequences  of  causal  assumptions  in  the  real  world. 

In  computer  science  and  artificial  intelligence,  there  has  been  work  on  integrating 
logical  and  probabilistic  reasoning.  Logical  reasoning  such  as  propositional  and  first- 
order  logic  is  brittle  especially  if  data  is  noisy.  Real  world  data  is  usually  noisy, 
especially  in  humanities  and  social  sciences.  Chaining  logic  inferences  has  the  greater 
risk  of  irrelevant  inferences  the  longer  the  logical  chain  of  reasoning.  Logical  reasoning 
also  has  combinatorial  explosion  and  scalability  problems.  In  everyday  life,  people  do  not 
usually  form  a  long  chain  of  logical  reasoning.  Instead,  it  can  be  argued  that  people 
handle  minimal  logical  reasoning  but  have  superb  knowledge  representation.  Modeling 
and  simulation  is  one  of  the  most  accurate  tools  for  knowledge  representation.  Statistical 
reasoning,  on  the  other  hand,  lacks  the  structural  knowledge  of  the  world.  Modeling  and 
simulation  can  lend  logical  reasoning  robustness  and  statistical  reasoning  structural 
knowledge  of  the  world.  To  achieve  this,  it  is  important  that  the  modeling  and  simulation 
focus  on  real  world  processes  instead  of  just  pure  logic,  thus  the  importance  of  validation. 

While  a  rule-based  system  is  sufficient  if  knowledge  engineers  are  able  to  check 
the  causal  relations  inherent  in  some  rules,  for  large  knowledge  bases  manual  checks  are 
cumbersome  and  prone  to  errors.  Thus  there  is  a  need  for  automation  through  formal 
causality  checking. 

There  are  computer  models  for  learning  causal  relations  from  data  and  for  causal 
inference,  a  result  of  causal  analysis  research  at  Carnegie  Mellon,  UCLA,  and  Stanford 
(Pearl  2003,  Spirtes  et  al.  2000).  These  models  for  causality  however  do  not  consider 
simulation  as  an  important  tool  in  causal  analysis.  Instead,  they  rely  on  graph  analysis, 
Bayesian  models,  and  mathematical  analysis.  Here,  I  will  deal  primarily  with  causal 
inference,  not  with  the  causal  learning  from  data. 

A  state-of-the-art  causal  inference  model  is  the  Pearlian  causal  model  (Pearl  2003, 
Pearl  2000).  To  account  for  the  probability  of  causation,  the  Pearlian  causal  model 
requires  the  use  of  Bayesian  priors  to  encode  the  probability  of  an  event  given  another 
event.  It  is  unable  to  model  ignorance,  ignores  contradictions,  and  is  incapable  of 
expressing  evidential  knowledge  without  the  use  of  the  probability  distribution  format. 
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Different  kinds  of  uneertainty  (whether  it  is  subjeetive  or  objective)  are  modeled  the 
same  using  Bayesian  distributions. 

The  Pearlian  causal  model  is  insufficient  for  validation  of  simulations  for  several 
reasons: 

1 .  The  aim  is  to  do  validation  in  uncertain  and  noisy  environments. 

2.  Assumptions  need  to  be  managed  explicitly,  not  through  conditional  probability. 

3.  There  is  a  need  to  clearly  delineate  between  subjective  uncertainty  (judgment)  and 
objective  uncertainty  (frequency). 

4.  Bayesian  priors  are  problematic  for  specification.  Rather  than  using  Bayesian 
priors  and  probabilistic  variables,  we  can  do  detailed  simulations.  In  addition  to 
inference/reasoning  mechanisms,  the  representation  is  important.  Using  graphs  or 
Bayesian  networks  as  representation  hinders  the  accurate  representation  of  reality. 
Simulations,  on  the  other  hand,  can  emulate  real  world  entities,  processes,  and 
mechanisms  closely. 

5.  There  is  no  simulation  component  in  the  Pearlian  causal  model,  forcing  it  to  resort 
solely  to  graph  and  Markovian  assumption  to  compute  the  effect  of  interventions. 
The  addition  of  knowledge  inference  renders  any  finite  state  machine  with  it  to  be 
non-Markovian.  An  intervention  in  the  causal  network  is  represented  by  cutting 
the  path  to  the  intervened  variable  from  all  other  variables  (deleting  certain 
mappings  from  the  model),  and  setting  the  value  of  that  variable  to  the 
intervention  value,  while  keeping  the  rest  of  the  model  unchanged.  In  this 
dissertation,  simulations  are  utilized  to  compute  the  effect  of  interventions.  The 
intervention  and  the  simulation  are  the  reflections  of  physical  intervention  and 
reality.  Identification  -  the  determination  of  whether  one  can  compute  the  post¬ 
intervention  distribution  from  data  governed  by  the  pre-intervention  distribution  - 
is  also  possible  by  a  direct  estimation  through  simulation. 

Simulation,  however,  is  not  without  weaknesses.  Any  simulation  model  is  only  as 
good  as  its  assumptions.  Additional  weaknesses  include  inherent  difficulties  in  getting 
decision  rule  accuracy,  soft  variables,  and  model  boundaries  right.  Decision  rules  for 
each  agent  are  difficult  to  get  and  ascertain.  Their  accuracy  is  often  uncertain.  Software 
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architects  are  often  in  need  of  the  skills  of  domain  experts  for  this  decision  rule 
determination.  Some  variables  affecting  the  agent  decision  are  soft  in  nature.  For 
example,  an  agent  may  buy  an  automobile  if  it  is  inexpensive,  reliable,  stylish,  and  fun  to 
drive.  The  variables  -  especially  the  last  two  -  are  soft,  meaning  they  are  hard  to  quantify 
and  their  meanings  can  differ  greatly  from  agent  to  agent.  In  building  a  simulation  model, 
software  architects  make  decisions  about  which  variables  are  exogenous  and  which  are 
endogenous  to  the  model.  The  decisions  have  a  large  effect  on  the  model  prediction.  This 
dissertation  provides  a  remedy  to  the  weaknesses  above,  particularly  in  managing  the 
model  assumptions  and  managing  the  model  boundaries.  It  also  suggests  an  avenue  to 
ameliorate  the  decision  rule  accuracy  and  soft  variables  problems  by  knowledge-based 
validation. 

Human  beings  constantly  fashion  causal  relations,  even  for  complex  systems. 
Many  of  these  causal  relations  are  spurious,  but  some  people  take  them  for  granted. 
These  need  to  be  modeled  in  social  simulation  systems,  since  however  misguided  the 
causal  relations  and  assumptions  may  be  they  guide  manifested  human  behaviors. 


2.7  Knowledge-based  Approach 


Knowledge-based  representation  can  theoretically  represent  any  other  representation.  A 
knowledge-based  approach  is  a  promising  approach  for  automating  validation  and  model- 
improvement  of  simulations,  particularly  multi-agent  social-network  simulations. 
Knowledge-based  approaches  denote  the  use  of  knowledge  representation  and  inference, 
whose  manifestations  are  in  the  form  of  knowledge  base  and  inference  engine  in 
Artificial  Intelligence  (Russell  and  Norvig  2003).  Systems  utilizing  knowledge-based 
approach  are  called  knowledge-based  systems.  An  example  of  knowledge-based  systems 
is  Cyc  (Lenat  and  Guha  1990).  Cyc  is  a  very  large,  general,  multi-contextual  “common 
sense”  knowledge  base  and  inference  engine.  It  is  an  attempt  to  do  symbolic  Artificial 
Intelligence  on  a  massive  scale  by  vacuuming  facts,  rules  of  thumb,  heuristics  about 
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entities  and  events.  Despite  its  massive  knowledge  bases,  Cye  is  still  brittle  due  to  its 
pure  symbolie  approaeh. 

The  breath  and  depth  of  knowledge  needed  for  validation  and  model- 
improvement  usually  exists  in  databases  and/or  silos  of  expert  knowledge.  Capturing  this 
knowledge  into  a  form  that  ean  be  proeessed  by  eomputers  is  a  eomerstone  of 
knowledge-based  approaeh.  The  eomputerized  knowledge  proeessing  is  usually  totally 
diseonneeted  from  the  proeess  of  designing,  validating,  and  improving  a  simulation 
model:  eonventionally  it  is  the  job  of  human  experts  without  the  aid  of  databases. 
Corporations  and  government  ageneies  have  large  databases  and  separate  simulation 
projeets:  eombining  the  two  effeetively  ean  provide  new  insights. 

Simulations  are  basieally  knowledge  systems.  They  ean  be  viewed  as  a  blaek  box 
spewing  out  knowledge  as  output  given  eertain  pieees  of  knowledge  as  input.  Testing  a 
blaek  box  is  done  by  giving  a  eertain  input  and  observing  the  outputs.  This  is  similar  to 
the  eraeking  of  the  Enigma  maehine  by  utilizing  the  existing  knowledge,  not  solely  by 
statistieal  tests. 

The  knowledge-based  approaeh  is  mostly  symbolie,  whieh  supports  the  modeling 
of  intelligenee  and  reason.  Indeed,  the  Physieal  Symbol  System  hypothesis  (Simon  1996) 
proposes  that  a  physieal  symbol  system  has  the  neeessary  and  suffieient  means  for 
general  intelligenee,  whieh  is  a  Strong  AI  view.  This  dissertation  subseribes  to  the  view 
that  a  physieal  symbol  system  is  important  but  the  underlying  non-symbolie  physieal 
proeesses  are  the  true  foundation.  It  attempts  to  eapture  the  physieal  proeesses  via 
simulations  and  the  symbol  system  via  knowledge-based  and  ontologieal  reasoning.  It  is 
midway  between  the  Strong  AI  and  Weak  AI  views.  The  advantages  of  symbolie 
arehiteetures  are: 

o  mueh  of  human  knowledge  is  symbolie,  so  eneoding  it  in  a  eomputer  is  more 
straightforward. 

o  how  the  symbolie  arehiteeture  reasons  may  be  analogous  to  how  humans  do, 
making  it  easier  for  humans  to  understand. 

o  symbolie  arehiteeture  may  be  made  eomputationally  eomplete  (e.g.  Turing 
Maehines). 
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The  knowledge-based  approach  allows  the  structure  of  the  real  world  problem  to 
be  incorporated  and  reasoned  about.  This  stands  in  contrast  to  the  statistical  approach 
which  cannot  handle  well  structural  and  componential  knowledge  and  fertile  inference  of 
logic.  The  knowledge-based  approach  can  capture  a  user's  informal  and  intuitive 
understanding  of  a  system.  Thus,  unlike  formal  methods  knowledge-based  approach 
(including  rule-based  and  causal  methods)  is  suitable  for  validation.  The  knowledge- 
based  approach  lends  itself  to  hypothesis  building  and  testing. 


2.8  Knowledge  Acquisition  Bottleneck 


For  any  knowledge-based  system  to  have  value,  the  knowledge  bases  need  to  be 
constructed.  This  is  done  by  knowledge  acquisition  from  human  experts  possibly  in  the 
form  of  heuristics.  Knowledge  acquisition  takes  time  and  is  prone  to  errors.  How  to  learn 
knowledge  automatically  from  data  is  an  active  area  of  research.  Causal  learning  from 
data  is  an  example.  Machine  learning  and  data  mining  are  two  fields  dealing  with 
learning  and  extracting  knowledge  from  data  respectively. 

This  dissertation  suggests  a  policy-based  way  to  minimize  the  problems  with 
knowledge  acquisition  bottleneck.  It  puts  the  knowledge  acquisition  on  the  shoulders  of 
persons  who  are  the  likeliest  to  possess  and  have  interests  in  entering  the  knowledge 
bases  correctly.  This  means  the  simulation  knowledge  bases  are  acquired  and  input  by  the 
simulation  developers,  while  the  validation  knowledge  bases  are  acquired  and  input  by 
the  validators  or  the  VV&A  practitioners. 

While  the  above  may  minimize  the  problems  of  knowledge  acquisition,  true 
knowledge  acquisition  should  happen  automatically.  This  dissertation  suggests  a  simple 
but  powerful  method  of  hypothesis  building  and  testing  (first  in  the  simulation  proxy  and 
then  with  the  empirical  data).  The  difference  between  this  method  and  machine  learning 
is  that  machine  learning  focuses  on  general  algorithms  and  is  knowledge  poor,  while  our 
method  of  hypothesis  building  (e.g.,  constructing  new  causal  relations  in  a  causal 
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networks)  and  testing  is  knowledge  intensive  and  is  not  focused  on  any  general  (or 
specific)  algorithm. 


2.9  Models,  Inference,  and  Hypothesis  Building  and 
Testing 


All  models  are  approximations.  There  are  mechanistic  models  (models  that  have  physical 
mechanisms  related  to  them  available)  and  empirical  models  (no  underlying  physical 
mechanisms  are  known  so  the  model  is  purely  empirical).  When  available,  a  mechanistic 
model  has  advantages  because  it  may  provide  a  physical  understanding  of  the  system  and 
greatly  accelerate  the  process  of  problem  solving  and  discovery.  A  mechanistic  model 
frequently  requires  fewer  parameters  and  thus  provides  estimates  of  the  fitted  response 
with  proportionately  smaller  variance.  Sometimes  however  an  empirical  model  can 
suggest  a  mechanism. 

Mechanistic  models  are  constructed  with  the  structural  knowledge  of  the  relevant 
real  world  processes.  As  structural  knowledge  is  formalized  and  put  into  knowledge 
bases,  inferences  from  the  knowledge  bases  can  be  made.  The  knowledge  bases  also 
show  the  extent  and  the  uncertainty  of  the  knowledge  therein.  Based  on  the  existing 
knowledge  and  knowledge  about  the  unknown  and/or  the  uncertain,  hypotheses  can  be 
constructed.  A  simplest  hypothesis  construction  is  done  by  searching  and/or  reasoning 
through  the  knowledge  bases  and  ontology  to  look  for  implications  that  have  not  been 
explored.  Simulation  then  allows  the  hypotheses  to  be  tested  in  proxy. 
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2.10  Alert  and  Inference  Engine 


Knowledge-based  systems  operate  on  mostly  symbolie  data.  They  employ  a  symbolic 
inference  engine.  Simulation  systems  operate  on  mostly  numerical  data.  In  order  to  use 
numerical  data  from  simulation  systems,  there  is  a  need  to  convert  them  to  symbolic 
information.  This  is  accomplished  by  the  Alert  module,  which  tests  numerical  data  using 
statistical  routines  against  certain  criteria  to  produce  symbolic  information.  For  example, 
the  Alert  module  tests  the  simulated  average  yearly  school  absenteeism  against  the 
empirical  minimum  and  maximum  value  of  the  annual  absenteeism  rate,  and  produces  a 
“value-too-high”  alert  if  the  simulated  average  is  higher  than  the  empirical  maximum. 
The  criteria  to  test  against  are  normally  based  on  empirical  data,  but  they  do  not  have  to 
be.  Tests  based  on  empirical  knowledge  can  also  be  used.  There  are  many  other  types  of 
test  that  could  be  performed,  including  classification,  peak  detection,  anomaly  detection, 
etc.  The  statistical  routines  used  vary  from  the  simplest  to  the  most  sophisticated.  If  the 
best  information  that  could  be  gathered  is  uncertain,  it  is  represented  in  probabilistic  form 
and  written  as  symbolic  information.  For  example,  if  we  are  interested  in  the  probability 
of  a  child  going  to  school  on  day  D,  and  the  outputs  of  the  simulation  show  that 
probability  being  equal  to  0.9,  we  could  put  it  symbolically  as  “a  child  going  to  school  on 
day  D  is  uncertain  with  probability  0.9”.  This  is  then  encoded  as  a  propositional  variable 
(in  the  form  of  a  symbolic  “alert”)  for  the  Inference  Engine  to  process.  Similar  encoding 
is  applied  to  the  information  about  value  ranges  and  curves. 

After  symbolic  information  is  gathered,  the  information  is  fed  into  the  Inference 
Engine.  The  Inference  Engine  is  a  production  system.  The  inference  proceeds  in  a 
forward-chaining  fashion.  The  Inference  Engine  takes  the  alerts  and  the  simulator’s 
causal  diagram,  in  addition  to  the  empirical  data  and  the  domain  knowledge  and 
parameter  constraints,  to  make  a  judgment  on  which  parameters,  causal  links,  and  meta¬ 
models  to  change  -  or  not  to  change  -  and  how. 


34 


2.11  Summary 


To  validate  multi-agent  soeial-network  systems  designed  to  charaeterize  complex  social 
problems  a  new  approach  is  needed.  The  new  approach  must  be  scalable  to  a  large 
number  of  variables  and  a  high  number  of  interactions.  The  new  approach  must  be 
sufficiently  automated  that  it  can  be  reapplied  as  new  data  comes  to  light  and  the  model  is 
changed.  It  must  be  flexible  enough  to  handle  data  at  different  levels  of  granularity, 
missing  data,  and  otherwise  erroneous  and/or  messy  data.  Most  importantly,  it  must  be 
able,  at  least  in  principle,  to  capture  a  user's  intuitive  informal  understanding  of  a  system. 
Formal  methods  lack  precisely  this  ability,  rendering  them  not  applicable  for  validation. 
Formal  methods  are  restricted  by  their  need  to  be  a  closed  world  and  to  be  logically 
derivable.  The  only  technique  that  can  scale  and  fulfill  the  above  requirements  is  the 
knowledge-based  methods  as  knowledge  can  be  as  abstract  or  as  detailed  as  needed. 

Capturing  existing  knowledge  in  a  form  that  can  be  processed  by  computers,  the 
knowledge-based  approach  allows  knowledge  inference.  Furthermore,  the  knowledge- 
based  approach  is  able  to  focus  the  search  and  inference  in  parameter  space.  It  is  scalable 
due  to  its  intelligent  focus  with  the  help  of  ontology.  It  can  also  process  causal  knowledge 
and  deal  with  imperfect  data. 

One  drawback  of  knowledge-based  approach  is  that  there  is  currently  a  bottleneck 
in  knowledge  acquisition  from  human  experts.  The  data  are  plentiful,  particularly  in  the 
bioinformatics,  biomedical  informatics,  economics,  and  social  sciences.  Trends  in  data 
gathering  point  to  a  deluge  of  data  in  more  fields  as  time  progresses.  Current  research  on 
data  mining  and  causal  learning  from  data  shows  that  it  is  feasible  to  extract  knowledge 
from  data.  A  drawback  of  knowledge-based  approach  is  that  it  also  needs  validation,  that 
is,  the  validation  of  knowledge  bases.  The  validation  of  knowledge  bases  is,  however, 
easier  for  human  experts  than  the  validation  of  simulations,  as  the  knowledge  bases  are 
largely  in  the  form  of  human-level  rules  and  relations.  Besides,  stakeholders  who  want  to 
validate  simulations  have  the  domain  knowledge  required  to  validate  it,  and  should 
provide  one  for  the  validation  process. 
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Chapter  III:  Inference  in  Artificial 
Intelligence  and  the  Scientific  Method 


Knowledge-based  systems  (Stefik  1995)  inelude  reasoning  steps  or  inferenees  on 
knowledge  bases.  An  Inferenee  Engine  is  the  part  of  knowledge-based  systems  that 
eontrols  and  performs  the  inferenees.  Ontology,  a  speeifioation  of  eoneeptualizations, 
augments  knowledge  bases.  The  type  and  nature  of  inferenees  need  to  be  designed  to  fit 
the  problem  domain. 

This  ehapter  explores  existing  inferenee  teehniques  in  artificial  intelligence  and  in 
the  scientific  method,  describes  their  strengths  and  weaknesses,  and  shows  a  new 
inference  technique  suitable  for  use  in  validation  and  model-improvement. 

Inference  is  a  part  of  learning.  While  learning  in  artificial  intelligence  (e.g., 
machine  learning,  data  mining,  explanation-based  learning,  causal  learning  from  data, 
reinforcement  learning,  case-based  learning,  etc.)  is  an  intriguing  topic,  I  chose  not  to 
delve  into  learning  -  outside  inference  and  a  simple  hypothesis  building  and  testing  -  to 
limit  the  scope  of  the  work. 


3.1  Inference  Techniques  in  Artificial  Intelligence 


Artificial  intelligence  is  a  study  of  how  to  make  computers  do  things  at  which,  at  the 
present  moment,  people  are  better.  It  deals  with  how  to  make  computers  act  in  a  human 
fashion,  think  like  humans,  act  rationally,  and/or  think  rationally  (Russell  and  Norvig 
2003). 

Inference  is  the  act  or  process  of  drawing  a  conclusion  based  solely  on  what  one 
already  knows.  Inference  is  studied  within  several  different  disciplines.  Human  inference 
(i.e.,  how  humans  draw  conclusions)  is  studied  within  the  field  of  cognitive  psychology 
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(Sternberg  and  Pretz  2005).  The  rules  and  processes  of  inference  are  some  of  the  oldest 
subject  matters  in  philosophy.  Logic  studies  the  laws  of  valid  inference.  Statisticians  have 
developed  formal  “rules”  for  inference  from  quantitative  data  (Lehmann  and  Romano 
2005).  Artificial  intelligence  researchers  develop  automated  inference  systems. 


3.1.1  Inference  by  Search 

Along  with  representation,  search  is  fundamental  in  artificial  intelligence.  Search,  by  its 
virtue  of  looking  for  and  of  testing  possible  solutions,  can  be  thought  of  as  inference.  In 
search,  the  sequence  of  actions  required  for  solving  a  problem  cannot  be  known  a  priori 
but  must  be  determined  by  a  trial-and-error  exploration  of  alternatives.  Almost  all 
artificial  intelligence  problems  require  some  sort  of  search.  There  are  different  kinds  of 
search: 

•  Search  in  search  space  (Russell  and  Norvig  2003) 

o  The  representation  of  search  space  usually  takes  a  form  of  graph  or 
cellular  tessellation.  The  way  the  search  is  performed  can  be  breadth-first, 
depth-first,  best-first,  or  heuristic. 

•  Search  in  production/expert  systems  (Giarratano  and  Riley  2004) 

o  In  a  backward  chaining  procedure,  the  search  is  performed  for  facts  that 
make  the  premises  of  a  rule  eligible  for  firing  the  rule  containing  the  goal 
as  a  result.  In  both  forward  and  backward  chaining  procedures,  search  is 
carried  out  for  facts  matching  the  clauses  of  a  rule.  Forward  chaining  is 
very  useful  for  a  system  to  respond  rapidly  to  changes  in  its  knowledge 
and  to  be  able  to  detect  one  of  a  large  number  of  possible  unusual  events. 
On  the  other  hand,  backward  chaining  is  more  directed,  and  so  is  more 
appropriate  for  a  system  that  knows  what  it  is  trying  to  do. 

•  Search  in  genetic/evolutionary  algorithm  (Goldberg  1989) 

o  In  genetic/evolutionary  algorithm,  search  is  a  function  of  the  fitness  value 
and  is  carried  out  by  genetic/evolutionary  operators.  That  is  to  say, 
subpopulations  with  the  best  fitness  scores  are  sought  and  then 
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recombined  (by  mutation,  crossover,  and  other  genetic/evolutionary 
operators)  to  produce  offspring.  The  process  of  fitness  selection  is  then 
repeated  with  this  progeny  population. 

The  strength  of  search  is  its  generality  and  applicability.  The  weakness  of  search  is  the 
explosion  of  the  number  of  items  or  states  a  search  algorithm  usually  has  to  deal  with. 
More  specifically,  for 

•  Search  in  search  space  (Russell  and  Norvig  2003) 

o  Strengths:  its  generality  and  mathematical  soundness, 
o  Weaknesses:  the  large  number  of  states,  the  need  for  heuristics,  and  the 
need  for  Markovian  assumption. 

•  Search  in  production/expert  systems  (Giarratano  and  Riley  2004) 

o  Strengths:  it  operates  in  the  symbol  space,  which  is  usually  smaller  in  size 
than  the  state  space.  It  does  not  need  to  have  Markovian  assumption, 
o  Weakness:  it  is  inefficient  for  the  straightforward  implementation  of 
expert  systems  -  keep  a  list  of  the  rules  and  continuously  cycle  through 
the  list,  checking  each  one's  left-hand-side,  LHS,  against  the  knowledge 
base  and  executing  the  right-hand-side,  RHS,  of  any  rules  that  apply.  It  is 
inefficient  because  most  of  the  tests  made  on  each  cycle  will  have  the 
same  results  as  on  the  previous  iteration.  Since  the  knowledge  base  is 
mostly  stable,  most  of  the  tests  will  be  repeated.  The  computational 
complexity  is  in  the  order  of  O(RF^P),  where  R  is  the  number  of  rules,  P 
is  the  average  number  of  patterns  or  clauses  per  rule  LHS,  and  F  is  the 
number  of  facts  on  the  knowledge  base.  This  is  alleviated  by  the  Rete  I 
algorithm  (Forgy  1982).  In  the  Rete  I  algorithm,  only  new  facts  are  tested 
against  any  rule  LHS.  Additionally  new  facts  are  tested  against  only  the 
rule  LHS  to  which  they  are  most  likely  to  be  relevant.  As  a  result,  the 
computational  complexity  per  iteration  drops  to  O(RFP),  or  linear  in  the 
size  of  the  fact  base.  Rete  I  has  high  memory  space  requirements.  The 
Rete  II  and  III  algorithms  are  said  to  have  fixed  this  memory  problem,  but 
the  algorithms  are  a  trade  secret  and  thus  not  in  public  domain.  Using 
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context  or  structured  knowledge,  we  may  be  able  to  avoid  high 
computational  complexity. 

•  Search  in  genetic/evolutionary  algorithm  (Goldberg  1989) 

o  Strengths:  it  mimics  evolution  in  nature,  a  simple  but  powerful 
mechanism.  It  is  relatively  robust  to  environmental  change  and  individual 
failures. 

o  Weaknesses:  it  only  changes  in  incremental  fashion  and  it  is  slow  to 
converge.  It  is  prone  to  dead  ends  and  suboptimal  solutions.  Unless  there 
is  an  incentive  for  diversity,  the  populations  tend  to  become  homogeneous 
within  one  particular  environment  or  niche.  If  the  environment  drastically 
changes,  the  previously  fit  populations  could  disappear  in  a  short  period  of 
time.  While  it  is  robust,  it  does  not  have  the  methodological  rigor  of  the 
scientific  method.  It  is  knowledge-poor.  It  does  not  directly  support  the 
accumulation  of  knowledge.  It  relies  on  knowledge-less  evolutionary 
operators  of  mutation  and  crossover. 


3. 1.1.1  Is  Search  Unavoidable? 

Human  scientists  reason  by  experiments  and  accumulation  of  knowledge,  in  addition  to 
search.  Grandmasters  in  chess  reason  by  using  carefully  learned  structured  domain 
knowledge.  Novice  chess  players  do  a  lot  of  unsophisticated  analyses  or  searches.  The 
amount  of  analysis  and  search  increases  significantly  for  the  intermediate  level  players. 
The  surprise  is  that  grandmasters  do  not  perform  more  analyses  or  searches  than  the 
intermediate  level  players.  They  instead  carefully  construct  highly  sophisticated  domain 
knowledge  and  use  it  effectively.  So  the  answer  to  the  question  “is  search  unavoidable?” 
is  “yes,  the  search  is  avoidable”.  The  qualification  to  this  answer  is  that  the  search  is 
unavoidable  if  specialized  knowledge  cannot  be  constructed.  If  knowledge  can  be 
constructed  effectively,  then  the  search  is  avoidable.  Thus  much  of  the  computational 
complexity  hindering  an  effective  use  of  algorithms  could  potentially  be  avoided  if  the 
right  knowledge  is  effectively  constructed  and  used. 
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3.1.2  Inference  by  Logic 


One  of  the  foundations  of  modem  science  is  the  logical  inference  (Russell  and  Norvig 
2003).  The  logical  inference  is  a  systematic  method  of  deriving  logical  conclusions  from 
premises  assumed  or  known  to  be  tme  and  from  factual  knowledge  or  evidence.  Logic  is 
the  science  of  reasoning,  proof,  thinking,  or  inference.  Logic  allows  the  analysis  of  a 
piece  of  reasoning,  and  the  determination  of  whether  it  is  correct  or  not.  In  artificial 
intelligence,  logical  inference  is  formalized  into  several  kinds  of  logic,  including: 

o  Propositional  logic:  a  mathematical  model  for  reasoning  about  the  tmth  of 
propositions.  Propositions  are  logical  expressions  or  sentences  whose  tmth 
values  can  be  determined. 

o  First  order  logic  or  predicate  logic:  a  mathematical  model  for  reasoning  about 
the  tmth  of  sentences  that  contain  variables,  terms,  and  quantifiers, 
o  Second  order  logic:  a  mathematical  model  for  reasoning  about  the  tmth  of 
sentences  that  contain  variables,  terms,  quantifiers,  and  functions.  An  example 
of  this  second  order  logic  is  situational  calculus  (Reiter  2001). 
o  Temporal  logic:  a  mathematical  model  for  reasoning  about  propositions 
qualified  in  terms  of  time. 

The  strengths  of  logic  are: 

o  The  statements  of  logic  are  concise  and  clear. 

o  If  the  facts  are  watertight,  inferences  drawn  from  them  are  also  watertight, 
provided  a  proper  inference  mechanism  is  used. 

The  weaknesses  of  logic  are: 

o  Whether  logic  governs  the  workings  of  the  universe  is  debatable.  Logic  does 
not  tmmp  physical  experiments.  Quantum  mechanics,  while  strange  to  normal 
human  logic  and  experience  about  how  the  (macro)  world  should  operate,  has 
been  shown  to  be  valid  experimentally. 

o  Logic  requires  statements  to  be  either  tme  or  false.  The  probabilistic  nature  of 
events  and  processes  means  that  logic  cannot  be  used  without  modification. 
Furthermore,  some  statements  are  neither  tme  nor  false, 
o  Logic  is  only  part  of  the  mental  processes  governing  social  systems. 


40 


o  Predicate  logic  or  first-order  logic  requires  the  predicate  to  be  cleanly  defined. 
Predicate  acts  like  a  membership  function.  For  example,  the  assertion 
bird(penguin)  with  the  predicate  bird  and  the  instance  penguin  assumes  that  a 
clear  definition  exists  for  the  predicate  bird.  If  the  definition  for  bird  is  that  of 
an  animal  that  can  fly  and  has  feathers,  then  the  assertion  of  penguin  being  a 
bird  is  false,  even  though  in  reality  it  is  true.  In  social  sciences,  the  difficulty 
encountered  in  the  attempt  to  cleanly  delineate  predicates  is  more  pronounced. 
For  example,  clean  definition  is  difficult  for  the  predicates  family,  marriage, 
friends,  enemies,  middle-class,  etc.  Without  the  ability  to  precisely  delineate 
predicates,  first-order  logic  and  second-order  logic  which  are  based  in  part  on 
predicates  will  not  be  able  to  perform  accurately.  This  means  the  logic  and  the 
result  of  the  logical  reasoning  become  fuzzy, 
o  Second-order  logic  requires  both  the  predicate  and  function  cleanly 
delineated. 

The  above  describes  deductive  logic.  In  addition  to  deductive  logic,  there  is 
inductive  logic.  An  argument  is  deductive  if  it  is  thought  that  the  premises  provide  a 
guarantee  of  the  truth  of  the  conclusion.  An  inductive  argument,  on  the  other  hand,  only 
attempts,  successfully  or  unsuccessfully,  to  provide  evidence  for  the  likely  truth  of  the 
conclusion,  rather  than  outright  proof.  Deductive  logic  works  from  the  more  general  to 
the  more  specific,  while  inductive  logic  works  the  other  way.  Inductive  reasoning  is  more 
open-ended  and  exploratory,  while  deductive  reasoning  is  narrower  and  is  concerned  with 
testing  or  conforming  hypotheses.  Both  kinds  of  reasoning  usually  present  in  synergy  in 
scientific  experiments.  Deductive  reasoning  exists  to  confirm  hypotheses  from  theories, 
while  inductive  reasoning  exists  to  build  theories  from  observations. 

3.1.3  Rule-Based  Systems 

Given  a  set  of  facts  and  assertions,  a  rule-based  system  (Durkin  1994,  Jackson  1999)  can 
be  created  by  specifying  a  set  of  rules  on  how  to  act  on  the  set  of  facts  and  assertions.  A 
rule  is  a  statement  composed  of  a  condition  and  an  action  that  specifies  what  to  do  when 
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the  condition  becomes  true.  This  forms  the  basis  for  expert  systems  (Durkin  1994, 
Jackson  1999).  The  concept  of  an  expert  system  is  that  the  knowledge  of  an  expert  is 
encoded  into  the  rule  set.  When  exposed  to  the  same  data,  the  expert  system  will  perform 
in  a  manner  similar  to  the  expert.  When  feasible,  it  is  desirable  to  derive  knowledge 
directly  from  data.  Causal  relation  learning  is  one  of  the  methods  to  do  this. 

Rule-based  systems  are  feasible  for  problems  for  which  most  of  the  knowledge  in 
the  problem  area  can  be  represented  in  the  form  of  rules  and  for  which  the  problem  area 
is  not  too  large  to  manage.  These  systems  however  hide  the  data  generation  process. 


3.1.4  Model-based  Reasoning 

Rule-based  systems  have  disadvantages.  They  hide  the  data  generation  process  and  the 
model  of  the  problem.  It  is  very  difficult  to  build  a  complete  rule  set.  It  is  time- 
consuming  and  error-prone  to  elicit  empirical  associations  or  heuristics  for  rules  from 
human  experts.  Adding  new  rules  requires  consideration  of  the  whole  rule  set,  given  that 
the  rules  are  frequently  interdependent.  Furthermore,  even  if  a  rule  set  is  complete,  there 
is  a  chance  of  it  becoming  obsolete.  Rules  are  notoriously  brittle.  When  faced  with  inputs 
that  deviate  slightly  from  the  normally  expected,  symbolic  rule-based  systems  are  prone 
to  fail.  Model-based  reasoning  provides  a  way  to  ameliorate  these  weaknesses  of  rule- 
based  systems.  Model-based  reasoning,  however,  works  wells  when  there  is  a  complete 
and  accurate  model  and  degenerates  for  less  accurate  and  less  comprehensive  model.  A 
good  approximation  to  models  however  is  causal  relations,  which  do  not  require  a 
complete  model. 


3.1,4,!  Assumptions-Based  Truth  Maintenance  System 

In  diagnosis,  rule-based  expert  systems  represent  diagnostic  knowledge  mainly  in  terms 
of  heuristic  rules,  which  perform  a  mapping  between  data  abstractions  (e.g.,  symptoms) 
and  solution  abstractions  (e.g.,  diseases).  This  kind  of  knowledge  representation  is 
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shallow,  in  the  sense  that  it  does  not  contain  much  information  about  the  data  generation 
process,  the  causal  mechanisms,  and  the  empirical  (physical,  chemical,  and  biological) 
models  underlying  the  relationships  between  diseases  and  symptoms.  In  everyday  life, 
operating  exclusively  based  on  rules  is  quite  common  without  the  understanding  or 
appreciation  how  and  for  what  purpose  the  rules  are  created.  The  rules  typically  reflect 
empirical  associations  or  heuristics  derived  from  experience,  rather  than  a  theory  of  how 
a  device,  organism,  or  system  actually  works.  The  latter  is  deep  knowledge  in  the  sense 
that  it  contains  the  understanding  of  the  structure,  functions,  and  components  of  the 
device  or  system. 

Rather  than  assuming  the  existence  of  an  expert  experienced  in  diagnosing  a 
problem,  model-based  approaches  assume  the  existence  of  a  system  description:  a 
consistent  and  complete  theory  of  the  correct  behaviors  of  the  system.  Assumptions- 
Based  Truth  Maintenance  System  (ATMS)  is  one  of  the  approaches.  Given  a  data  set 
about  a  malfunction(s),  ATMS  conjectures  one  or  more  minimum  perturbations  to  the 
system  description  that  would  account  for  the  malfunctions(s). 

The  advantages  of  this  deep  knowledge  approach  over  heuristic  rule-based 
systems  are  (Jackson  1999): 

o  Given  a  system  description,  the  software  architect  is  able  to  avoid  the 
laborious  process  of  eliciting  empirical  associations  from  a  human  expert. 

o  The  reasoning  method  is  system  independent,  so  it  is  not  necessary  to  tailor 
the  inference  machinery  for  different  applications. 

o  Since  only  knowledge  of  correct  system  behavior  is  required,  the  method  is 
capable  of  diagnosing  faults  that  have  never  occurred  before. 
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3.1.5  Causal  Reasoning 


When  there  are  good  reasons  to  believe  that  events  of  one  sort,  the  eauses,  are 
systematieally  related  to  events  of  some  other  sort,  the  effects,  it  may  become  possible  for 
us  to  alter  our  environment  by  producing  (or  by  preventing)  the  occurrence  of  certain 
kinds  of  events.  Causal  reasoning  refers  to  the  use  of  knowledge  about  cause-effect 
relationships  in  the  world  to  support  plausible  inferences  about  events.  Example 
applications  of  automated  causal  reasoning  systems  include  solving  diagnostic  problems, 
determining  guilt/innocence  in  legal  cases,  and  interpreting  events  in  daily  life.  Causal 
reasoning  has  been  treated  mathematically  as  a  formal  causal  model  and  graph  (Pearl 
2003,  Pearl  2000). 

Causal  reasoning  has  had  problems  with  figuring  out  how  to  handle 
compounding,  happenstance,  and  chaos.  Causations  risk  oversimplifying  complex 
phenomena.  People  tend  to  use  one-to-one  cause-and-effect  notion.  We  often  read  “Fed 
interest  rate  increase  will  tame  inflation”  when  the  reality  is  much  more  complex.  Major 
causes  of  inflation  reduction  might  be  the  existence  of  Walmart  and  the  deflationary 
effects  of  the  global  pool  of  labor.  There  is  a  debate  on  whether  causation  is  fundamental. 
Causality  is  probably  an  anthropomorphic  notion.  Once  a  mechanism  of  physical  or 
social  processes  is  known,  causality  becomes  secondary,  which  is  to  say,  it  functions  as 
simplified  explanations.  On  the  other  hand,  causal  reasoning  is  prevalent  in  human 
beings.  Ignoring  it  is  unwarranted,  especially  in  any  social  modeling. 


3.1,5,!  Rule-Based  versus  Causal  Reasoning 


The  “if-then”  rule-based  inference  has  an  unfortunate  artifact  of  producing  incorrect 

inferences  if  knowledge  engineers  do  not  take  special  precautions  in  encoding  the  rules. 

This  artifact  is  demonstrated  by  the  following  incorrect  inference  from  two  correct  rules 

using  a  correct  inference  mechanism  (chaining): 

Rule  1:  If  the  lawn  is  wet,  then  it  rained 

Rule  2:  If  we  break  the  water  main,  then  the  lawn  gets  wet 

Inference:  If  we  break  the  water  main,  then  it  rained 
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Thus  there  is  a  need  to  explieitly  represent  causality,  which  includes  representing 

actions  instead  of  just  observations  and  addressing  confounding.  Incorporating  causality 

would  enable  a  proper  adjustment  to  the  above  rules: 

Cause  1:  Raining  caused  the  lawn  to  be  wet 

Cause  2:  Breaking  the  water  main  causes  the  lawn  to  be  wet 

Inference:  None 

As  shown,  Rule  1  was  encoded  erroneously  were  causal  relations  taken  into  account. 
While  erroneous  in  its  cause-effect  relation.  Rule  1  can  still  be  useful  as  a  suggestion  or 
hint.  Causal  reasoning  is  similar  to  the  deductive  reasoning  process  while  rule-based 
reasoning  is  similar  to  the  inductive  reasoning  process.  Thus  both  the  rule-based  and  the 
causal  inferences  are  useful. 


3.1.6  Probabilistic  Reasoning 

Uncertainty  is  inherent  in  many  problems  because  the  real  world  does  not  operate  as  a 
Boolean  system.  To  handle  uncertainty,  probabilistic  reasoning  is  employed  in  artificial 
intelligence.  There  are  several  ways  to  do  probabilistic  reasoning:  certainty  factor, 
Bayesian  Networks,  fuzzy  logic,  etc.  Bayesian  Networks  is  currently  the  most  widely 
used  model  in  artificial  intelligence,  robotics,  and  machine  learning  for  probabilistic 
reasoning. 


3. 1,6,1  Certainty  Factors 

Certainty  factors  provide  a  simple  way  of  updating  probabilities  given  new  evidence.  A 
certainty  factor  is  used  to  express  how  accurate,  truthful,  or  reliable  a  rule  is  assessed  to 
be.  It  is  used  in  the  MYCIN  expert  system  (Buchanan  and  Shortliffe  1984). 
Mathematically,  a  certainty  factor  is  a  number  in  the  range  -1.0  to  +1.0,  which  is 
associated  a  rule.  A  certainty  factor  of  1.0  means  the  rule  or  proposition  is  certainly  true. 
A  certainty  factor  of  0.0  means  the  rule  is  judged  to  be  agnostic  as  there  is  no  information 
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available  about  whether  the  rule  is  true  or  not.  A  eertainty  faetor  of  -1.0  means  the  rule  is 
eertainly  false.  A  eertainty  factor  of  0.7  means  that  the  rule  is  quite  likely  to  be  true,  and 
so  on. 

Certainty  factors  are  inelegant  theoretically,  but  in  practice  this  tends  not  to  matter 
too  much.  This  is  mainly  because  the  error  in  dealing  with  uncertainties  tends  to  lie  as 
much  in  the  certainty  factors  attached  to  the  rules  (or  in  conditional  probabilities)  as  in 
how  the  rules  with  certainty  factors  are  manipulated.  These  certainty  factors  are  usually 
based  on  rough  guesses  of  experts  in  the  domain,  rather  than  based  on  actual  statistical 
estimations.  These  guesses  tend  not  to  be  very  good.  Certainty  factors  are  bound  by  the 
laws  of  probability. 


3. 1,6,2  Bayes  Theorem  and  Bayesian  Networks 

The  essence  of  the  Bayesian  approach  (Neapolitan  2003)  is  a  mathematical  rule 
explaining  how  one  should  change  one's  existing  beliefs  in  the  light  of  new  evidence.  The 
Bayesian  approach  is  founded  on  Bayes  Theorem,  an  expression  of  correlations  and 
conditional  probabilities.  Conditional  probabilities  represent  the  probability  of  an  event 
occurring  given  evidence.  Bayes  Theorem  can  be  derived  from  the  joint  probability  of  A 
and  B  {i.Q.,  p{A,E))  as  follows: 
p(A,B)  =p(B,A) 
p(A\B)p(B)  =p(B\A)p(A) 
p(A\B)  =  (p(B\A)p(A))/p(B) 

where  P{A\B)  is  referred  to  as  the  posterior,  P{B\A)  is  known  as  the  likelihood,  P{A)  is  the 
prior  and  P{B)  is  generally  the  evidence. 

A  Bayesian  or  belief  network  represents  the  same  information  as  a  joint 
probability  distribution,  but  in  a  more  concise  format.  The  graph  of  a  network  has  nodes 
which  represent  variables  and  directed  edges  which  represent  conditional  probabilities. 
This  directed  graph  is  prohibited  to  have  directed  cycles.  The  nodes  are  connected  by 
arrows  or  directed  edges  which  show  the  influence  of  the  variables  upon  one  another. 
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Each  node  has  a  conditional  probability  table  that  quantifies  the  effects  of  the  other  nodes 
that  have  influences  on  it. 

Bayesian  methods  have  been  successfully  applied  to  wide  range  of  problems. 
They  are  easy  to  understand  and  elegant  mathematically.  They  are  based  on  classical 
probability,  and  thus  are  considered  sound  by  most  researchers  and  have  the  aura  of 
scientific  respectability  even  though  the  specification  of  priors  does  not  have  a  rigorous 
treatment.  The  determination  of  conditional  independence  and  Markovian  property  is 
partially  based  on  judgment  calls. 

In  certain  circumstances,  however,  Bayesian  methods  are  not  appropriate.  Let  A 
represent  the  proposition  Kirsten  Dunst  is  attractive.  The  axioms  of  probability  insist  that 

P(A)  +  P(^A)  =  1 

Now  suppose  that  a  person  does  not  even  know  who  Kirsten  is.  We  cannot  say  that  this 
person  believes  the  proposition  if  he/she  has  no  idea  what  it  means.  Moreover,  what 
makes  a  person  attractive  varies  across  cultures  and  persons.  Neither  is  it  fair  to  say  that 
he/she  disbelieves  the  proposition.  It  should  therefore  be  reasonable  and  meaningful  to 
denote  his/her  belief  of  bel{A)  and  bel('-A)  as  both  being  0. 

Bayesian  networks  are  a  powerful  method  for  representing  and  reasoning  with 
uncertainty.  Most  of  the  applications  of  Bayesian  networks,  however,  have  been  in 
academic  exercises  rather  than  industrial  applications  that  real  businesses  rely  on.  The 
main  reason  why  Bayesian  networks  have  not  yet  deployed  into  many  significant 
industrial-strength  applications  lies  in  its  knowledge  acquisition  bottleneck.  It  is 
immensely  hard  to  acquire  conditional  probability  relations  and  priors  correctly  from 
human  experts.  This  lack  of  industrial  applications  of  Bayesian  networks  stands  in 
contrast  with  the  successful  industrial  applications  of  simulations  and  expert  systems. 


3, 1,6,3  Inference  in  Artificial  Neural  Networks 

An  Artificial  Neural  Network  (ANN)  is  an  information  processing  method  that  is  inspired 
by  the  way  biological  nervous  systems,  such  as  the  cortex,  process  information  (Dayhoff 
1990,  Anderson  1995).  It  is  composed  of  a  large  number  of  highly  interconnected 
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processing  elements  (neurons)  working  in  eoneert  to  solve  speeifie  problems.  ANNs,  like 
people,  learn  by  example.  An  ANN  is  eonfigured  for  a  speeifie  applieation,  sueh  as 
pattern  reeognition  or  data  elassifieation,  through  a  learning  proeess.  Learning  in 
biologieal  systems  involves  adjustments  to  the  synaptie  eonneetions  that  exist  between 
the  neurons.  This  is  true  for  ANNs  as  well.  ANNs  ean  handle  numerieal  pattern 
elassifieations  well  but  not  symbolie  reasoning. 


3. 1.6,4  Fuzzy  Logic 

Fuzzy  logie  (Kosko  1996)  allows  partial  set  or  fuzzy  membership  rather  than  erisp  set 
membership.  This  gives  birth  to  the  name  fuzzy  logie.  It  is  a  variant  of  multi-value  logie. 
Fuzzy  logie  eommenees  with  and  builds  on  a  set  of  linguistie  rules  provided  by  humans, 
usually  eontaining  soft  or  qualitative  variables.  The  fuzzy  systems  eonvert  these  rules  to 
their  mathematieal  equivalents  using  membership  funetions.  This  makes  the  task  of  the 
software  arehiteet  simpler  and  results  in  eloser  representations  of  the  way  systems  behave 
in  the  real  world,  espeeially  when  soft  or  qualitative  variables  are  involved.  Additional 
benefits  of  fuzzy  logie  inelude  its  simplieity  and  its  flexibility.  Fuzzy  logie  ean  handle 
problems  with  impreeise  and/or  ineomplete  data,  and  it  ean  model  nonlinear  funetions  of 
arbitrary  eomplexity. 

Weaknesses  of  fuzzy  logie  inelude  the  use  of  ad-hoc  non-linear  truneation  and 
jagged  interpolation  in  its  membership  funetions  and  the  fuzziness  of  the  qualitative 
symbolie  data  -  linguistie  variables  -  sueh  as  “very  tall”,  “tall”,  ete.  The  vagueness  of 
fuzzy  variables  hinders  the  exaet  representation  and  reasoning  needed  for  the  rigor  of 
sound  seienee.  Furthermore,  multi-value  logie  sueh  as  fuzzy  logie  has  a  larger  risk  of 
losing  its  meaning  as  the  number  of  multiple-logie-values  inereases.  Even  though  in  the 
end  a  fuzzy  variable  gets  mapped  into  real  numbers,  exaetness  is  erueial  and  eannot  be 
guaranteed  by  fuzzy  logie. 
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3.1.7  Evidential  Reasoning 


Instead  of  focusing  on  the  truth  value  or  probabilistic  value  of  assertions  and 
propositions,  which  may  be  abstract,  evidential  reasoning  focuses  on  the  evidence  itself 
and  the  data  generation  processes.  Evidential  reasoning  requires  several  conditions  to 
operate,  such  as: 

o  Falsifiability:  contrary  evidence  that  would  prove  a  claim  false  must  be 
possible  to  conceive  of 

o  Comprehensiveness:  the  evidences  offered  in  support  of  any  claim  must  be 
exhaustive. 

o  Logic:  any  argument  offered  as  evidence  in  support  of  any  claim  must  be 
sound.  An  argument  is  said  to  be  "valid"  if  its  conclusion  follows  unavoidably 
from  its  premises.  It  is  "sound"  if  it  is  valid  and  if  all  the  premises  are  true. 


3, 1.7.1  Dempster-Shafer  Theory  of  Evidence 

The  Dempster-Shafer  Theory  of  Evidence  (Russell  and  Norvig  2003)  was  introduced  as  a 
way  of  representing  epistemic  knowledge.  In  this  formalism,  the  best  representation  of 
chance  is  a  belief  function  rather  than  a  Bayesian  mass  distribution.  There  are  two 
measures  of  certainty:  belief  and  plausibility.  Belief  denotes  the  support  each  conclusion 
has  from  the  observations.  Plausibility  accounts  for  all  observations  that  do  not  rule  out  a 
given  conclusion.  Dempster-Shafer  Theory  (DST)  of  Evidence’s  appeal  rests  on  the  fact 
it  more  naturally  encodes  evidence  instead  of  propositions.  Bayesian  theory  is  included  in 
the  theory  of  evidence  as  a  special  case,  since  Bayesian  functions  are  belief  functions, 
and  Bayes'  rule  is  a  special  case  of  Dempster's  rule  of  combination. 

The  Dempster-Shafer  Theory  of  Evidence  is  not  as  widely  applied  as  the  Bayesian 
Networks  due  to  the  following: 

o  It  is  not  based  on  classical  probability.  Thus  it  is  deprived  of  the  aura  of 
scientific  respectability. 
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o  In  the  late  1970s,  Lofti  Zadeh  wrote  a  eritique  that  states  that  Dempster’s  Rule 
of  Combination  has  a  fundamental  diserepancy  in  that  it  may  yield 
eounterintuitive  results  when  given  eonflicting  information  (Zadeh  1984). 
o  However  subjeetive  and  arbitrary  Bayesian  priors  are  they  are  simple  to 
understand.  Furthermore,  Bayesian  Networks  and  Bayesian  Statisties  ean  be 
elegantly  formulated  mathematieally. 

Reeent  researeh  shows  that  Zadeh’s  eritique  against  Dempster’s  Rule  of  Combination  is 
unjustified  (Haenni  2005).  A  eompelling  but  ultimately  erroneous  example  based  on 
Zadeh’s  eritique  is  as  follows.  Suppose  that  a  patient  is  seen  by  two  doetors  regarding  the 
patient’s  neurologieal  symptoms.  The  first  doetor  believes  that  the  patient  has  either 
meningitis  with  a  probability  of  0.99  or  brain  tumor  with  a  probability  of  0.01.  The 
seeond  doetor  judges  the  patient  suffers  from  a  eoneussion  with  a  probability  of  0.99  but 
admits  the  possibility  of  a  brain  tumor  with  a  probability  of  0.01.  Using  the  values  to 
ealeulate  the  m(brain  tumor)  with  Dempster’s  rule,  it  is  found  that  m(brain  tumor)  = 
he/(brain  tumor)  =  1.0,  whieh  means  it  is  100%  believed  that  the  brain  tumor  is  the 
eorreet  diagnosis.  This  result  implies  a  eomplete  support  for  a  diagnosis  that  both  doetors 
eonsider  to  be  very  unlikely.  A  eommon  but  mistaken  explanation  for  this  is  that  the 
possible  eonfliets  between  different  pieees  of  evidenee  are  mismanaged  by  Dempster’s 
Rule  of  Combination.  This  is  a  very  compelling  example,  which  is  why  this  contributed 
to  the  near  demise  of  Dempster-Shafer  Theory  of  Evidence  research.  Many  researchers 
have  used  this  example  to  completely  reject  DST  or  construct  alternative  combination 
rules  (Sentz  2003). 

The  counterintuitive  result  turns  out  not  to  be  caused  by  a  problem  within 
Dempster’s  Rule  of  Combination,  but  rather  by  a  problem  of  misapplication.  Zadeh’s 
model  does  not,  in  fact,  correspond  to  what  people  have  in  mind  in  such  a  case.  There  are 
two  different  ways  to  fix  the  problem  (Haenni  2005). 

One  way  is  based  on  the  observation  that  the  diseases  are  not  exclusive.  The 
simple  set  0  =  {meningitis,  concussions,  brain  tumor}  implies  exactly  one  of  these 
diseases  is  the  true  one.  In  reality,  diseases  are  almost  never  exclusive,  so  Zadeh’s  choice 
for  0  is  in  question.  Switching  from  {meningitis,  concussions,  brain  tumor}  to  its  power 
set  (that  is,  the  combinations  of  diseases)  would  give  the  frame  of  reference  0  =  {(f),  M, 
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C,  T,  MC,  MT,  CT,  MCT}  =  where  M=meningitis,  C=coneussions,  7'=brain 

tumor.  Using  this  power  set,  DRC  behaves  normally  (Haenni  2005). 

The  other  way  is  based  on  the  observation  that  experts  are  not  fully  reliable 
(Haenni  2005). 

Thus  as  long  as  the  right  model  is  utilized,  DRC  is  a  good  method  to  eombine 
evidenee  thus  making  DST  useful.  In  fact,  DRC  behaves  very  well  for  high  conflicts  and 
especially  for  high  conflicts.  This  suggests  DST  is  as  reliable  as  -  if  not  more  so  - 
Bayesian  Methods.  The  caveat  here  is  that,  of  course,  the  model  must  be  correct.  In  many 
applications,  it  is  not  trivial  to  derive  the  correct  model.  Furthermore,  to  use  DST,  we 
have  to  first  have  the  Frame  of  Reference  chosen  over  certain  parameter  space.  The 
choice  of  the  Frame  of  Reference  for  simulation  systems  is  non-trivial  and  usually  ad- 
hoc.  Additionally,  ignorance  implies  a  contradiction  in  simulation  systems  (which  we 
have  God's  eye  view)  or  when  validation  knowledge  is  provided. 


3. 1.7.2  Data  Fusion 

Data  Fusion  (Hall  and  Llina  2001)  is  the  process  of  combining  multiple  data  for  the 
purpose  of  producing  information  of  better  value  than  the  individual  processing  of  data 
alone.  Data  originate  from  many  sources.  Sources  may  be  similar,  such  as  multiple 
radars,  or  dissimilar,  such  as  acoustic,  electro-optic,  electro-mechanical  (e.g.,  haptic 
devices),  or  passive  electronic  emissions  measurement.  A  key  issue  is  the  ability  to  deal 
with  conflicting  data  and  producing  intermediate  results  revisable  as  more  data  becomes 
available. 

Instead  of  using  Dempster’s  Rule  of  Combination,  in  certain  domains  it  is  more 
reasonable  to  directly  use  the  domain  knowledge  and  ontology  to  combine  evidence.  If 
domain  knowledge  and  ontology  can  be  adequately  specified,  it  can  become  more 
straightforward  to  use  it  to  combine  evidences  and  do  reasoning.  Dempster’s  Rule  of 
Combination  and  Dempster-Shafer  Theory  of  Evidence,  while  useful,  are  general 
formalisms.  Specific  domain  knowledge,  on  the  other  hand,  has  specialized,  precise,  and 
effective  application. 
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3.2  Inference  Techniques  in  Scientific  Method 


Strangely  the  field  of  artificial  intelligence  (Russell  and  Norvig  2003)  largely  ignores 
how  scientists  carry  out  experiments,  build  models  and  hypotheses,  accumulate 
knowledge,  test  hypothesis,  construct  theories,  and  draw  conclusions.  The  exception  is  in 
the  use  of  inductive  logic  for  scientific  discovery.  But  due  to  the  inherent  weaknesses  of 
induction,  the  method  is  not  part  of  the  main  artificial  intelligence  reasoning  techniques. 
It  can  be  given  any  other  name,  but  the  above  scientific  activities  can  be  best  described 
by  the  word  and  notion  of  inference.  It  is  known  more  commonly  as  the  scientific 
method,  which  is  fundamental  in  advancing  science.  No  other  methods  of  inference  have 
matched  the  effectiveness  of  the  scientific  method  in  understanding  the  real  world. 


3.2.1  Statistical  Inference 

Statistical  inference  deals  with  the  problem  of  inferring  properties  of  an  unknown 
distribution  from  data  generated  by  that  distribution.  The  most  common  type  of  inference 
involves  approximating  an  unknown  distribution  by  choosing  a  distribution  from  a 
limited  family  of  distributions.  Generally  this  family  of  distributions  is  specified 
parametrically. 

One  of  the  weaknesses  of  statistical  inference  is  that  it  is  biased  toward  the 
majority  of  a  sample  population.  If  there  is  a  person  who  has  a  very  eccentric  personality 
and  lifestyle,  this  person  will  often  be  considered  as  an  outlier,  a  noise,  or  an  error.  A 
sufficient  number  of  eccentric  individuals  may  cause  Type  I  Error.  While  focusing  on 
getting  accurate  statistics  of  sample  populations  is  useful,  in  many  cases  eccentric 
individuals  are  the  key  to  describing  and  predicting  events.  For  example,  international  air 
travelers  played  a  major  role  in  the  spread  of  Severe  Acute  Respiratory  Syndrome 
(SARS)  in  2003  before  it  was  contained. 

Moreover,  statistical  inference  lacks  a  means  to  describe  causal  relations.  It  also 
lacks  an  effective  means  to  handle  symbolic  knowledge.  During  SARS  outbreak,  the 
symbolic  knowledge  of  how  the  disease  spreads  was  a  crucial  clue  of  how  to  handle  the 
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outbreak.  It  assumes  independenee  between  samples:  it  does  not  take  networks  into 
aceount.  Soeial  networks  play  an  important  role  in  human  soeiety  and,  by  extension,  in 
multi-agent  systems  modeling  human  soeiety. 

The  above  weaknesses  ean  be  remedied  using  a  combination  of  simulation  and 
knowledge  inference.  Simulation  allows  high  precision  in  modeling  a  sample  point  (e.g., 
as  an  agent  and  as  networks).  Knowledge  inference  allows  the  reasoning  based  on  the 
structures  of  the  real  world  problems  to  augment  statistics. 


3.2.2  Hypothesis  Building  and  Testing 

There  are  really  only  two  ways  to  ascertain  how  the  world  works.  One  way  is  to  talk  and 
argue  about  it,  but  this  is  unreliable  as  arguments  and  words  alone  cannot  determine  if  a 
statement  is  true.  Logic,  while  helpful,  is  not  without  difficulty:  the  premises,  the 
assumptions,  and  the  application  of  logical  entailments  need  to  be  correct.  Logic  does  not 
exist  in  a  vacuum.  Furthermore,  how  the  physical  and  social  worlds  operate  does  not 
conform  to  “logical”  commonsense  or  even  pure  logic.  Much  of  physics  is 
counterintuitive.  Thus  proofs  based  on  observations  and  experiments  are  required. 

A  better  way  is  to  perform  careful  observations  and  carry  out  experiments.  The 
result  of  doing  this  is  universal  as  it  is  reproducible  by  any  skeptic.  This  forms  the  basis 
of  the  scientific  method,  which  is  the  best  way  yet  discovered  for  discerning  the  truth 
from  delusions  and  untruths.  The  basic  steps  of  the  scientific  method  are: 

1)  Observe  some  part  of  the  reality. 

2)  Introduce  a  tentative  description  -  a  hypothesis  -  that  is  consistent  with  the 
observation. 

3)  Use  the  hypothesis  to  make  predictions. 

4)  Test  the  predictions  by  experiments  or  further  observations  and  modify  the 
hypothesis  in  the  light  of  the  results. 

5)  Repeat  steps  (3)  and  (4)  until  there  are  no  discrepancies  between  the 
hypothesis  and  experiment  and/or  observation. 
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When  there  are  no  diserepaneies  left  between  the  hypothesis  and  the  experiment  and/or 
observation,  eonsisteney  is  obtained.  Consisteney  is  erueial  to  establish  validity.  Based  on 
this  eonsisteney  within  a  elass  of  phenomena,  the  hypothesis  beeomes  a  theory.  A  theory 
is  a  framework  within  whieh  observations  are  explained  and  predietions  are  made.  The 
above  steps  show  that  knowledge  is  physieally  eonstrueted  from  the  empirieal  primitives 
through  hypothesis  formation  and  testing.  In  other  words,  meanings  are  eonstrueted 
grounds-up  from  faets.  Systems  ean  only  be  understood  in  terms  of  physieal  proeesses 
whieh  manifest  them  and  by  whieh  they  are  assembled.  Semanties,  ontology,  language, 
and  mathematies  must  be  understood  in  the  eontext  of  the  physieal  reality. 

What  is  laeking  in  artifieial  intelligenee  inferenee  teehniques  is  a  eareful 
applieation  of  the  scientifie  method.  Artifieial  intelligenee  foeuses  on  speeifie  inferenee, 
seareh,  and/or  learning  algorithms.  Maehine  learning  foeuses  on  general  learning 
algorithms.  The  erueial  task  of  eonstrueting  and  testing  hypotheses  is  left  for  human 
researehers  to  perform.  Part  of  the  reason  is  that  observation  whieh  requires  visual 
reeognition  is  hard  to  automate.  However,  in  the  modeling  and  simulation  field  (in  whieh 
visual  reeognition  is  not  as  hard)  hypothesis  building  and  testing  whieh  requires 
knowledge-intensive  model  and  meta-model  building  is  left  without  automation.  Poliey 
iteration  in  reinforeement  learning  is  one  teehnique  analogous  to  hypothesis  building  in 
artifieial  intelligenee. 

Using  soeial  simulations  grounded  by  empirieal  knowledge  and  data  as  a  proxy  of 
the  real  world,  a  novel  inferenee  teehnique  ean  perform  experiments  in  simulations  and 
build  hypotheses  using  inferenee  engines  by  tweaking  meta-models  of  the  simulations.  If 
a  means  presented  itself  in  sensing  and  manipulation,  then  real  world  experimentation  is 
enabled,  supplementing  simulation-based  experimentation. 

The  question  of  seareh  spaee  and  eomputational  eomplexity  may  be  addressed  by 
performing  eareful  hypothesis  building  and  testing.  The  hypothesis  building  and  testing 
requires  deep  knowledge  and  edueated  guesses.  In  other  words,  it  requires  deduetion  and 
induetion.  Aehieving  human  intuition  is  hard,  but  intuition  ean  be  at  least  partially 
emulated  using  deduetion  and  induetion.  Knowledge  inferenee  and  metieulous  virtual 
experimentation  is  the  first  step  toward  seientifie  method-eapable  artifieial  intelligenee. 
Here,  the  simulation  with  its  models  and  meta-models  is  the  representation  of  real  world 
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knowledge.  Instead  of  representing  the  real  world  as  Bayesian  Networks  or  other 
conventional  artificial  intelligence  representations,  it  is  represented  more  faithfully  as 
simulations. 
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3.3  Knowledge-based  Hypothesis  Formation  and 
Testing 


Simulations  serve  as  a  proxy  to  the  real  world.  If  the  simulation  representation  of  the  real 
world  is  good  enough  for  the  poliey  question  at  hand,  then  experiments  performed  in  the 
simulation  would  likely  remain  valid  in  the  real  world  and  the  simulated  experimental 
results  would  mimie  the  results  of  real  world  experiments. 

As  knowledge  inferenee,  ontologieal  reasoning,  and  simulation  are  eombined,  the 
hypotheses  ean  be  eonstructed  by: 

(1)  Searching  the  knowledge-base  for  unknown  and/or  uncertain  knowledge 
areas. 

(2)  Inference  using  the  knowledge-base  to  detect  the  probable  existence  of  new 
rules. 

(3)  Discovery  of  data  patterns  that  do  not  fit  into  any  of  the  knowledge-base  rules. 

(4)  Opportunistic  search  and  fitness-based  search. 

(5)  Knowledge-based  ontology  search  or  ontological  reasoning. 

(6)  Classification  examination  by  ontological  reasoning. 

Hypotheses  can  be  tested  by  proxy  using  simulations.  Empirical  data  is  used  to  validate 
the  simulations  for  the  policy  question. 

The  above  differs  from  inductive  logic  programming  in  that  it  is  not  solely  reliant 
on  logic.  It  uses  deep  knowledge  and  pattern  analysis  for  induction.  It  also  goes  in  the 
deductive  direction,  working  from  domain  conceptual  knowledge  to  suggest  a  probable 
existence  of  new  rules  or  ontological  categories.  It  perturbs  existing  knowledge  and 
model  to  find  a  fit  to  new  experimental  results  and/or  observations. 

Natural  science  provides  examples  of  the  power  and  insight  of  a  proper 
classification,  a  kind  of  ontology.  Two  successful  examples  are  the  Darwinian 
evolutionary  classification  of  life  forms  even  before  the  arrival  of  DNA  classification  and 
the  Periodic  Table  of  Chemistry  that  is  capable  to  predict  the  existence  of  yet  discovered 
chemical  elements.  The  strengths  of  these  classification  ontologies  derive  from  the  fact 
that  they  focus  on  natural  processes.  Scientific  understanding  of  the  natural  processes 
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underlies  the  elassification  ontologies.  The  Standard  Model  of  partiele  physics  is  another 
example  of  successful  classification  ontology.  All  these  indicate  the  utility  of  knowledge- 
based  and  ontological  reasoning  when  properly  used,  especially  with  careful 
consideration  of  natural  processes. 
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3.4  Summary 


Current  artificial  intelligence  reasoning  techniques  have  strengths  and  weaknesses 
summarized  in  the  following  table. 


Table  1.  Reasoning  Methods  Comparison 


Method 

Strengths 

Weaknesses 

Search  in  search  space 

Generality 

Computational  complexity 

Search  in  production 
systems 

Operates  in  symbolic  space, 
usually  a  smaller  space  than 
state  space 

Naive  implementation 
causes  long  processing  time. 
Rete  I  fixed  this  but  at  the 
cost  of  memory.  Rete  II  and 
Rete  III  improved  upon  Rete 

I,  but  are  not  in  the  public 
domain. 

Search  in  genetic  and 
evolutionary  systems 

Powerful  and  robust 

Incremental  change,  hard  to 
avoid  suboptimal  solutions, 
knowledge-less  evolutionary 
operator  of  mutation  and 
crossover 

Logical  inference 

Clear,  concise,  and  if  the 
facts  and  the  inference 
mechanism  are  watertight, 
the  inference  is  watertight. 

Not  all  natural  and  social 
phenomena  are  logical. 

Need  to  assign  true  or  false 
values  to  every  statement. 

Rule-based  inference 

Ability  to  capture  expert 
knowledge 

The  inference  is  only  as 
good  as  the  quality  of  expert 
heuristic  knowledge.  If  the 
rules  are  derived  from  the 
conceptual  model,  however, 
this  weakness  disappears. 

Causal  inference 

Ability  to  emulate  causal 
reasoning 

Causal  reasoning  thrives 
when  the  mechanisms  are 
still  unclear  or 
undiscovered.  It  risks 
oversimplifying  complex 
phenomena. 

Certainty  factors 

Simple  but  workable 

Bound  by  the  laws  of 
probability.  Ignorance 
cannot  be  modeled. 

Bayesian  networks 

Easy  to  understand, 
mathematically  elegant, 
based  on  classical 
probability 

Non-rigorous  priors,  bound 
by  the  laws  of  probability. 
Ignorance  cannot  be 
modeled. 
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Artificial  Neural  Networks 

Mimics  natural  neural 
circuits  to  a  degree. 

Adaptive  learning,  self¬ 
organization,  real-time 
operation,  and  fault 
tolerance. 

Cannot  operate  on  symbolic 
information.  In  pattern 
recognition,  it  is  subsumed 
by  the  Support  Vector 
Machine  (Vapnik  2000). 

Fuzzy  logic 

Simple  and  flexible 

Fuzzy.  Brittle  membership 
functions.  Multi-value  logic 
risks  losing  meanings  when 
the  number  of  logic-values 
increases. 

Dempster-Shafer 

General,  robust,  and 
reliable.  Generalize 

Bayesian  Methods. 

Ignorance  can  bee  modeled. 

Need  to  be  careful  in 
modeling  to  avoid  errors  in 
evidence  combination  of 
conflicting  information.  Ad- 
hoc  Frame-of-Reference 
determination. 

Data  Fusion 

Specialized  methods, 
including  ontology-based 
method 

Not  general.  Strengths  and 
weaknesses  depend  on 
chosen  methods  and 
application  domain 

Statistical  inference 

Mathematically  sound. 

Cannot  operate  on  symbolic 
information.  Causality 
cannot  be  modeled. 

Minority  eccentric 
individuals  smoothed  over. 

Hypothesis  building  and 
testing  (scientific  method) 

General  and  powerful 
method.  Not  limited  to 
numerical  information. 

More  complex  than 
statistical  inference.  Much 
more  knowledge-intensive. 
Require  “intelligence”  to 
construct  hypotheses 

The  most  promising  techniques  are  knowledge-based  methods  and  hypothesis 
building  and  testing  based  on  the  scientific  method.  Knowledge-based  methods  work 
mostly  on  symbolic  information.  Hypothesis  testing  in  statistics  works  mostly  on 
numerical  data.  Hypothesis  testing  based  on  symbolic  information  is  only  used  manually. 
Knowledge-based  hypothesis  building  and  testing  allows  the  processing  of  both 
numerical  and  symbolic  data. 

Combining  knowledge-based  methods  (including  causal  and  rule-based  systems) 
and  hypothesis  testing  -  both  numerical  and  symbolic  -  is  a  good  way  to  create  a 
validation  and  model-improvement  system  for  simulations.  Instead  of  focusing  on  pure 
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logic,  causal  logic  (and  thus  our  knowledge-based  methods)  allows  the  focus  on  the 
proeesses  and/or  mechanisms  of  the  real  world.  As  knowledge-based  hypothesis  building 
and  testing,  augmented  by  simulations  and  focusing  on  processes  and  mechanisms,  is 
similar  to  what  human  scientists  do  in  their  scientific  work,  it  might  form  an  empirical 
path  toward  artificial  intelligence. 
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Chapter  IV:  What-If  Analyzer  (WIZER) 


This  chapter  describes  a  tool  based  on  the  knowledge-based  and  ontologieal  approaeh  for 
validation.  First,  I  elaborate  how  the  tool  works  conceptually.  Next,  I  describe  the 
detailed  components  of  the  tool.  This  includes  the  knowledge  spaces,  the  Alert  module, 
and  the  Inference  Engine.  The  Alert  module  performs  data  description  and  matching 
resulting  in  symbolic  information  in  addition  to  numeric  one.  The  Inference  Engine 
performs  inferences  on  simulation  events.  In  the  knowledge  space,  I  create  and  use  a 
simulation  description  logic,  which  is  inspired  by  ontology  (Gomez-Perez  et  al.  2004) 
and  the  DAML+OIE  inference  language  to  describe  the  simulation  model  and  results. 
This  integration  effort  follows  similar  integration  efforts  on  Eogie  Programs  and 
Description  Eogie  (Grosof  et  al.  2003)  and  on  Logic  Programs  and  Production  Systems 
(Grosof2005). 

I  call  the  tool  WIZER,  for  What-If  Analyzer.  While  WIZER  enables  validation,  I 
also  describe  how  it  enables  model-improvement.  Next,  I  give  an  illustrated  run  of  the 
tool.  Einally,  feature  eomparison  between  WIZER  and  other  tools  is  provided. 

As  WIZER  is  a  knowledge-based  tool,  the  importance  of  knowledge  -  and  the 
reasoning  based  on  that  knowledge  -  is  emphasized  in  this  chapter.  While  WIZER  uses 
statistieal  tools,  they  are  used  in  the  context  of  knowledge  bases  and  inferenees.  The 
simulation  output  curves  have  the  knowledge  eomponents  behind  them  and  they  ean  be 
described  based  on  knowledge.  All  inference  rules  and  deseriptions  about  statistical  tools 
are  encoded  declaratively  first,  with  additional  supporting  routines  encoded  imperatively 
(in  proeedural  manner). 

The  main  obstaele  in  any  knowledge-based  tool  is  the  knowledge  aequisition 
bottleneck,  which  is  the  difficulty  of  extracting  knowledge  from  human  experts.  WIZER 
partially  avoids  this  knowledge  aequisition  bottleneck  to  the  extent  possible  by 
distributing  the  knowledge  aequisition  responsibility  to  the  corresponding  stakeholders: 
simulation  knowledge  to  the  simulation  developers  and  validation  knowledge  to  the 
validation  evaluators.  Causal  learning  from  data  and  machine  learning  techniques  ean  be 
used  to  address  the  knowledge  aequisition  bottleneck.  This  dissertation  only  gives  an 


61 


example  of  knowledge-based  seareh  and  hypothesis  testing  for  aequiring  new  knowledge 
in  the  form  of  new  eausal  relations. 


4.1  How  WIZER  Works  Conceptually 


WIZER  ineludes  a  knowledge  spaee  module,  the  Alert  module,  and  the  Inferenee 
module.  The  knowledge  spaee  module  eontains  eausation  rules  for  the  simulation  model 
and  the  domain  knowledge  in  the  form  of  graph.  The  graph’s  nodes  represent  entities, 
while  the  edges  represent  relationships.  The  Alert  module  does  two  tasks:  (1)  deseribing 
the  data,  e.g.,  using  statistieal  and  pattern  elassifieation  tools,  (2)  matehing  that  data 
deseription  with  empirieal  data,  produeing  symbolie  alerts.  Symbolie  alerts  here  are 
defined  to  be  symbolie  eharaeterizations  of  numerieal  data  (not  just  alerts  in  the  sense  of 
imminent  danger).  These  symbolie  alerts  allow  WIZER’ s  Inferenee  Engine  to  proeess 
eausation  and  IE-THEN  rules.  (The  Inferenee  Engine  ean  also  eonsider  numerieal  data.) 
The  prineiple  of  inferenee  is  a  simple  one:  being  able  to  derive  new  data  from  data  that  is 
already  known.  The  Inferenee  Engine  module  takes  in  the  outputs  of  the  Alert  module, 
performs  inferenees  on  them,  and  produees  reeommendations  on  whieh  variable  to 
ehange  and  by  how  mueh.  The  inferenees  are  aided  by  ontology.  The  ontology  is  defined 
as  a  speeifieation  of  a  ooneeptualization.  Every  knowledge-based  system  is  eommitted  to 
some  eoneeptualization.  Here  I  ehoose  to  make  the  eoneeptualization  explieit,  using 
ontology.  The  Alert  and  the  Inferenee  Engine  modules  ean  be  used  on  their  own  given 
appropriate  inputs.  Eigure  3  (below)  shows  the  diagram  of  WIZER. 
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Simulator 


Figure  3.  WIZER  Diagram 

The  Domain  Knowledge  Space  module  provides  domain  knowledge  to  the 
Inference  Engine.  The  knowledge  is  in  the  form  of  graphs  or  networks.  The  other  name 
for  domain  knowledge  space  is  domain  ontology;  they  are  assumed  to  be  the  same  here. 
The  empirical  data  could  change  the  domain  knowledge  and  the  domain  knowledge  could 
restrict  and  influence  what  empirical  data  is  acceptable.  This  depends  on  the  strength  of 
evidence  supporting  the  knowledge  and  the  data. 

The  Simulator  Knowledge  Space  module  provides  the  simulator  with  knowledge 
such  as  the  causal  network  of  the  simulation  model  to  the  Inference  Engine.  The 
Inference  Engine  produces  new  parameter  values  and  possibly  new  links  for  the 
Simulation  Knowledge  Space  module.  The  simulator  influences  and  is  influenced  by  the 
Simulator  Knowledge  Space  module.  The  parameter  data  used  in  the  simulator  is 
assumed  to  be  contained  in  the  Simulation  Knowledge  Space  module.  The  parameter  data 
is  empirical,  but  this  empirical  data  is  used  in  the  simulator.  As  the  empirical  data  used  in 
the  simulator  is  not  the  same  as  the  data  used  for  validation,  this  separation  makes  the 
distinction  conceptually  clear. 

Both  domain  and  simulator  knowledge  spaces  are  represented  by  a  graph.  More 
significantly,  I  created  a  new  derivation  of  description  logic  to  describe  the  knowledge 
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spaces,  the  simulation  model,  the  simulation  outputs,  empirieal  data,  and  statistieal  test. 
This  deseription  logie  was  inspired  by  DAML+OIL  and  RDF  and  is  called  Simulation 
Deseription  Logic.  In  the  N3  notation  for  RDF,  the  basic  syntax  is  a  simple  one; 
<variablel>  <relationship>  <variable2>,  where  variablel  eould  be  a  subject, 
relationship  could  be  a  verb,  and  variablel  eould  be  an  objeet. 

The  Alert  module  evaluates  simulation  output  data  with  respeet  to  eorresponding 
empirieal  data.  Before  this  evaluation,  the  Alert  module  eomputes  the  deseription  of  the 
output  data,  possibly  using  statistieal  tools.  For  example,  the  Alert  module  ean 
symbolieally  deseribe  the  ups-and-downs  of  a  sehool  absenteeism  eurve  taking  into 
aeeount  other  symbolie/eontextual  information  sueh  as  the  holidays  and  vacations.  The 
evaluation  produees  symbolic  alert  information.  The  symbolie  alert  converts  quantitative 
data  into  symbolie  eategories.  Thus  the  Alert  module  eonverts  quantitative  data  into  alert 
symbolie  eategories.  As  noted  before,  the  notion  of  alert  here  ineludes  normal  symbolie 
information,  not  just  emergeney/alert  information.  In  other  words,  it  is  in  essenee  the 
symbolie  eategorization  or  identifieation  of  numerieal  information.  A  measure  of  validity 
ean  be  eomputed  using  speeial  eategories  denoting  that  the  outputs  “mateh  empirieal  data 
and/or  knowledge”.  While  not  depleted  in  the  figure  to  avoid  unneeessary  clutter,  the 
Alert  ean  semantieally  eategorize  input  data  and  empirieal  data  as  well. 

The  Inferenee  Engine  takes  the  outputs  from  the  Alert  module  and  the  simulator’s 
eausal  diagram  and  possibly  a  meta-model  (of  the  simulation's  knowledge  space),  in 
addition  to  empirieal  data,  domain  knowledge,  and  parameter  constraints  (of  the  domain 
knowledge  spaee),  to  make  a  judgment  on  whieh  parameters,  eausal  links,  and  model 
elements  to  ehange  -  or  not  to  ehange  -  and  how.  How  mueh  a  parameter  value  or  a  link 
should  ehange  is  influeneed  by  the  simulation  model.  The  inference  engine  ealeulates  the 
minimal  perturbations  to  the  model  to  fit  the  outputs  aeeording  to  a  model-based 
approaeh  similar  to  the  Assumptions-Based  Truth  Maintenance  Systems,  whieh  keeps  the 
assumptions  about  the  model  in  an  environment  lattiee.  The  model  (ineluding  the  causal 
diagram)  and  the  potential  alternate  models  are  eoded  in  ontology  and  rules  using 
Simulation  Deseription  Logie.  The  perturbations  are  implemented  as  the  effeets  of 
ontological  and  rule-based  reasoning.  The  inferenee  produces  new  parameters  for  the 
next  simulation.  This  cyele  repeats  until  a  user-defined  validity  level  is  aehieved.  (The 
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user  interface  module  is  not  shown  for  clarity.)  In  short,  the  Inference  Engine  figures  out 
which  parameters,  links,  and  models  need  to  change  to  fit  the  simulation  to  empirical  data 
and  domain  knowledge. 

In  addition  to  having  rule  and  causal  inference  submodules,  WIZER  has 
submodules  for  simulator  knowledge  and  domain  knowledge  operation,  validation,  and 
model-improvement.  The  validation  submodule  computes  the  degree  of  match  between 
simulation  outputs  and  knowledge  against  empirical  data  and  knowledge.  The  model- 
improvement  submodule  determines  the  changes  needed  to  make  the  simulator  outputs 
and  knowledge  better  match  the  empirical  data  and  knowledge.  Empirical  knowledge 
here  forms  domain  knowledge;  while  some  domain  knowledge  may  not  be  empirical,  we 
use  the  terms  interchangeably  here  for  the  notion  of  target  knowledge.  To  compute  the 
needed  changes,  hypothesis  building  is  employed  based  on  existing  knowledge.  The  next 
simulation(s)  would  then  test  the  hypothesis.  An  additional  routine  keeps  track  of 
whether  there  is  an  improvement  in  the  simulation  validity. 
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Figure  4  shows  the  inferenee  or  reasoning  types  in  WIZER.  The  diagram  in  the  figure  is 
the  same  with  that  of  Figure  3,  but  with  the  inner  reasoning  types  shown  and  the  data 
flow  descriptions  hidden  for  clarity. 


Figure  4,  Types  of  Reasoning  in  WIZER 


As  shown,  the  Alert  WIZER  employs  statistical  inference,  comparison,  and 
semantic  categorization  aided  with  knowledge  and  ontology.  The  Inference  Engine  has 
reasoning  mechanisms  which  form  the  core  of  WIZER;  causal  reasoning,  “if-then”  rule- 
based  reasoning,  conflict  resolution,  model  perturbation,  and  ontological  reasoning  for 
validation  and  model  improvement  purposes.  It  also  has  model  comparison  and 
hypothesis  formation  for  the  purpose  of  model  improvement.  Both  the  domain  knowledge 
space  and  the  simulation  knowledge  space  employ  ontological  reasoning.  The  simulator 
acts  as  if  it  has  “simulation”  reasoning,  which  plays  a  role  at  producing  emergences,  for 
example.  The  hypotheses  are  tested  by  proxy  in  simulator  validated  against  empirical 
data  and  knowledge.  They  can  also  be  tested  directly  against  the  empirical  data.  Data 
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mining  and  machine  learning  tools,  outside  the  eurrent  WIZER  implementation,  ean  be 
employed  to  extraet  information  from  the  empirieal  data. 


4.2  Definition  of  WIZER  by  the  Computer  Science 
Concepts 


WIZER  is  a  mix  of  knowledge-based  and  ontology-based  system,  tied  to  the  simulation 
model.  It  ineludes  model-based  reasoning  in  the  form  of  eausal  and  ontologieal 
reasoning.  It  also  has  rule -based  reasoning  tied  to  the  model.  Additionally,  it  deseribes  its 
statistieal  tools  using  ontology.  Underlying  the  eausal  relations,  WIZER  has  the  proeess 
ontology  and  proeess  logie  based  on  the  simulator  eoneeptual  model  (and  thus  the  eode 
implementation  of  the  eoneeptual  model).  WIZER  model-based  reasoning  is  similar  to 
truth  maintenanee  systems,  but  instead  of  using  a  dependeney  network  (or  an 
environmental  lattiee),  it  uses  a  eausal  network.  WIZER  rule-based  reasoning  is  similar  to 
a  forward-ehaining  system  but  with  rules  tied  to  the  simulation  model  and  with  ontology- 
and  model-based  eonfliet  resolution.  The  knowledge-based  and  ontology-based  routines 
are  elosely  tied  to  the  simulation  models,  simulations,  and  empirieal  data.  This  makes 
WIZER  unique  among  and  different  from  other  knowledge-based  systems. 

Coneisely,  WIZER  is  defined  as  an  ontologieal  and  knowledge-based  simulation 
model  reasoning  system,  with  proeess,  rules,  eausation,  and  statistieal  eomponents. 

The  steps  for  preparing  a  simulation  system  for  WIZER  are: 

1 .  Take  or  ereate  the  eoneeptual  model  of  the  simulation. 

2.  Create  the  eausal  model  from  the  eoneeptual  model.  This  eausal  model  eonsists  of 
the  abstraet  infiuenee/eausal  model  and  the  eonerete  eausal  model.  The  abstraet 
eausal  model  represents  whieh  variable  influenees  another  variable.  (This  abstraet 
eausal  model  ean  be  thought  of  as  the  infiuenee  model,  but  I  use  the  notion  eausal 
model  to  emphasize  eausality.)  The  eonerete  eausal  model  represents  how  a 
variable  with  a  value  eauses  another  variable  to  have  another  value.  These  eausal 
models  allow  expedited  probing  of  the  root  eause  of  a  problem.  This  is  similar  to 
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the  environmental  lattiee  which  allows  perturbations  to  the  system  descriptions  in 
assumption  truth  maintenance  systems. 

3.  Create  the  process  logic  for  each  causal  relation  in  the  causal  model.  This  process 
logic  is  closely  tied  to  implementation  code. 

4.  For  each  relevant  output  variable  of  a  causal  relation,  create  a 
semantic/ontological  description  or  potential  classification  of  the  possibly 
dynamic  output/variable. 

5.  Create  rules  based  on  the  causal  model  and  the  process  logic. 

6.  Introduce  conflict  resolution  rules  based  on  the  causal  model. 

7.  For  all  the  steps  above,  the  relevant  ontology  is  created  and  used  as  needed. 
Generating  causal  models  from  simulation  conceptual  models  may  be  ontologically  and 
computationally  feasible,  but  is  not  done  here.  Physically,  mechanisms  and/or  processes 
form  the  foundation  for  causality.  Causal  relations  are  constructed  by  human  beings 
based  on  perceived  or  inferred  order  in  the  seemingly  chaotic  world. 
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4.2.1  An  Example  of  WIZER  Setup 


A  small  portion  of  the  code  of  the  BioWar  simulator  is  presented  below  in  pseudo-code 
to  serve  as  an  illustration.  The  pseudo-code  represents  a  function  which  adds  new 
symptoms  and  symptom  severity  values  to  an  agent  who  contracts  a  disease. 

function  AddSymptoms  in  an  agent  who  has  a  disease 
for  all  new  symptoms  do 

let  symp  be  a  new  symptom 

if  symp  already  exists  in  the  agent 

increase  duplicate  symptoms  count 

else 

get  the  evoking  strength  value  from  the  QMR  table  referenced  by  symp 
convert  the  evoking  strength  value  to  a  severity  value 
add  the  severity  value  to  the  total  severity  value 
end  of  if 

add  the  symptom  symp  to  the  agent 
end  of  for 

end  of  function,  return  the  total  severity  value 

note  QMR  stands  for  Quick  Medical  Reference,  a  table  relating  diseases  and  symptoms 


The  step-by-step  procedure  for  the  above  routine  is  as  follows: 

1.  The  conceptual  model  for  this  routine  is  an  agent  with  a  disease  having  one  or 
more  symptoms  manifested  for  this  disease  and  these  symptoms  have  severity 
values  whose  sum  is  sought. 

2.  The  abstract  causal  model  for  the  routine  is  simply  “the  existence  of  a  symptom 
causes  the  realization  of  the  symptom  severity,  which  in  turn  causes  the  increase 
in  the  total  severity  for  this  agent”.  (Of  course,  a  symptom  and  its  severity  are 
inseparable  physically,  but  this  is  the  causal  model  for  the  simulation,  not  for  the 
empirical  world.)  The  concrete  causal  model  is  not  available  for  this  example. 

3.  The  process  logic  (and  the  process  model)  is  the  logic  and  semantic  description  of 
pseudocode,  algorithm,  and  the  code  itself  (for  simulation  models).  It  is 
augmented  with  the  process  ontology.  For  empirical  or  domain  knowledge,  the 
process  logic  represents  the  real  world  process. 

4.  We  have  three  variables  in  the  causal  model:  the  existence  of  a  symptom,  the 
severity  of  this  symptom,  and  the  sum  of  the  total  severity  of  all  symptoms.  These 
are  all  described  semantically.  In  more  complex  variables,  curves  or  surfaces  may 
be  described  semantically. 

5.  The  rule  for  relating  the  severity  of  the  symptom  to  the  existence  of  the  symptom 
is  a  simple  “severity  of  the  symptom  implies  existence  of  the  symptom”  for  the 
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above  routine.  Moreover,  another  rule  can  say  “if  total  severity  value  of 
symptoms  is  not  zero  then  some  symptoms  must  exist”. 

6.  There  is  no  conflict  resolution  for  the  rules,  as  the  rules  are  simple  and  have  no 
conflict. 

7.  Relevant  ontologies  are  created  for  the  conceptual  model,  causal  model,  process 
model,  and  rules. 


70 


4.3  Simulation  Description  Logic 


Description  logics  are  considered  one  of  the  most  important  knowledge  representation 
schemes  unifying  and  giving  a  logical  basis  to  previous  representations  such  as  frame- 
based  systems,  semantic  networks  and  KL-ONE-like  languages,  object-oriented 
representations,  semantic  data  models,  and  type  systems.  Resource  Description 
Framework  (RDF)  is  a  universal  format  for  data  on  the  Internet  based  on  description 
logic.  Using  a  simple  relational  model,  it  allows  structured  and  semi-structured  data  to  be 
mixed,  exported,  and  shared  across  different  applications.  RDF  data  describes  all  sorts  of 
things,  and  where  XMF  schema  just  describes  documents,  RDF  -  and  DAMF+OIF  - 
talks  about  actual  things.  In  RDF,  information  is  simply  a  collection  of  statements,  each 
with  a  subject,  verb  and  object  -  denoted  as  a  triple.  A  human  readable  notation  for  RDF 
is  known  as  Notations  or  N3.  In  N3,  an  RDF  triple  can  be  written  as  the  following,  with  a 
period  ending: 

<#pat>  <#k;nows>  <#jo>  . 

DAML  provides  a  method  for  stating  properties  such  as  inverses,  unambiguous 
properties,  unique  properties,  lists,  restrictions,  cardinalities,  pairwise  disjoint  lists, 
datatypes,  and  others. 

Following  the  spirit  of  description  logic,  here  I  create  the  Simulation  Description 
Logic  (SDL)  adopting  a  somewhat  modified  tripled  notation.  The  subject  and  object  part 
of  the  triple  can  be  any  variable,  instance,  or  concept.  In  WIZER,  the  subject  and  object 
part  can  consist  of  multiple  variables,  instances,  or  concepts.  Thus  this  forms  a  modified 
N3.  The  verb  part  of  the  triple  contains  any  relation  of  interest.  In  particular,  the  verbs 
“causes”  and  “is  influenced  by”  denote  causal  relations  and  “if-then”  denotes  if-then  rule 
relations.  The  real-world  semantics  can  be  encoded  this  way  of  what  the  simulation  code 
is  supposed  to  accomplish.  This  corresponds  to  facts  and  rules  of  the  knowledge  base. 
The  real-world  semantics  can  be  delimited  and  influenced  by  the  policy  question  at  hand. 

Additionally,  to  describe  the  simulation  outputs  which  often  come  in  the  form  of 
curves,  in  addition  to  the  semantic  description  of  the  outputs  using  the  above  tripled 
notation,  I  use  an  augmented  notation  of: 
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<input>  <computational  logic>  <output>  . 

to  denote  the  eomputational  logic  or  process.  The  process  logic  underlying  causal 
relations  adopts  similar  notation.  For  example,  the  process  for  the  statistical  computation 
of  a  mean  is  as  follows: 

Name:  mean  computation 

Input:  real  numbers  Nl,  N2,  Nx  of  a  sample  population,  or  in  N3, 

<x  numbers>  <is  part  of>  <a  sample  population>  . 

Computational  logic:  add(Nl,  N2,  Nx)  divided  by  count  (Nl,  N2,  Nx) 

Mean  describes  a  sample  population,  or  in  N3: 

<mean>  <describes>  <a  sample  population>  . 

<mean>  <is>  <add  numbers  divided  by  number  count> 

Output:  result  of  the  computational  logic  in  the  form  of  a  real  number  N 

Using  this  logic  description,  the  Inference  Engine  can  be  made  to  reason  about  the 
statistical  tools  it  is  using.  The  triple  <mean>  <describes>  <a  sample population>  is  the 
semantic  description,  while  the  “add... divided  by  count...”  part  is  the  computational  or 
process  logic.  The  former  is  declarative,  the  latter  is  procedural.  Both  are  needed  and 
related  to  each  other. 


4.4  Simulation  and  Knowledge  Spaces 


The  steps  of  finding  a  match  between  the  simulator  outputs/model  and  the  target 
knowledge  can  be  viewed  as  a  search  in  model  and  knowledge  spaces.  Figure  5  illustrates 
the  relationship  between  the  search  in  the  model  and  knowledge  spaces.  In  the  model 
space,  the  search  is  through  its  parameters  and  links.  In  the  knowledge  space,  the  search 
is  through  rules  and  causations,  in  addition  to  ontological  reasoning.  The  two  searches 
influence  each  other.  The  results  of  search  in  knowledge  space  may  reduce  the  scope  for 
search  in  simulation  model  space;  the  range  of  allowed  parameter  values  in  simulation 
model  space  may  restrict  the  scope  of  search  in  knowledge  space.  The  perturbation  of 
system  description  is  described  in  the  knowledge  and  simulation  space  using  ontology 
and  rules.  This  includes  the  consideration  of  variable  compounding  (e.g.,  variables  that 
must  be  simultaneously  on,  off,  or  have  certain  values  for  a  specific  output  response  to 
occur). 
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Simulation  Model  Space 


Knowledge  Space 


Figure  5,  Searches  through  Simulation  Model  versus  Knowledge  Spaces 


In  WIZER,  the  simulation  knowledge  spaee  is  represented  by  a  graph  with  its 
eorresponding  operations.  This  graph  has  nodes  representing  eoncepts  and  links/edges 
representing  the  relationships  between  eoneepts.  Furthermore,  it  is  augmented  by 
knowledge-bases  and  routines  denoting  how  the  value  of  a  node  should  ehange  with 
respect  to  its  neighboring  node(s).  In  N3,  this  is  written  as 
<nodel>  <relationship>  <node2>  . 

The  N3  requirements  are  relaxed  somewhat  in  WIZER  in  that  the  nodes  can  contain 
multiple  variables,  instances,  or  concepts.  For  example,  the  type  of  the  management  of  a 
firm  can  be  either  homogeneous  or  heterogeneous,  and  in  a  modified  N3  this  can  be 
written  as 

<management>  <has_type_of>  <homogeneous,  heterogeneous>  . 
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4.5  Alert  WIZER 


The  Alert  module  evaluates  simulation  output  data  with  respect  to  corresponding 
empirical  data.  The  evaluation  can  be  in  the  form  of  alerts  set  when  the  simulated  data  is 
outside  empirical  data  bounds.  To  generate  alerts,  the  data  must  be  described,  possibly 
using  statistical  tools.  After  data  description,  comparison  with  empirical  data  yields 
symbolic  information.  The  Alert  module  converts  quantitative  data  into  alert  symbolic 
information.  The  notion  of  alert  here  includes  normal  non-emergency  symbolic 
information.  The  Alert  module  acts  as  a  symbolic  or  knowledge-based  pattern 
classification  and  recognition  system.  Potential  patterns  can  be  enumerated  in  advance. 
Surprise  patterns  can  be  classified  by  their  componential  knowledge.  Totally  unexpected 
patterns  can  be  marked  as  such  and  a  special  alert  is  issued  for  human  examination  of 
these  patterns. 

The  Alert  can  also  evaluate  any  simulation  data,  including  input  data,  based  on 
symbolic,  rule-based,  and/or  ontological  criteria.  For  example,  the  Alert  can  characterize 
the  interaction  network  for  management  personnel  as  either  homogeneous  or 
heterogeneous  based  on  the  features  of  the  interaction  network. 

The  statistical  routines  that  the  Alert  could  use  include: 

1 .  Mean  computation: 

Input:  real  numbers  Nl,  N2,  Nx  of  a  sample  population,  or  in  N3, 

<x  numbers>  <is  part  of>  <a  sample  population>  . 

Computational  logic:  add(Nl,  N2,  Nx)  divided  by  count  (Nl,  N2,  Nx) 

Mean  describes  a  sample  population,  or  in  N3: 

<mean>  <describes>  <a  sample  population>  . 

Output:  result  of  the  computational  logic  in  the  form  of  a  real  number  N 

2.  Minimum  and  maximum  computation 

Input:  real  numbers  Nl,  N2,  ...,  Nx  of  a  sample  population,  or  in  N3, 

<x  numbers>  <is  part  of>  <a  sample  population>  . 

Computational  logic:  max(Nl,  N2,  ...,  Nx) 

Max  describes  a  sample  population,  or  in  N3: 

<max>  <describes>  <a  sample  population>  . 

Output:  result  of  the  computational  logic  in  the  form  of  a  real  number  N. 

(similar  semantics  for  the  minimum) 
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3.  Variance  and  standard  deviation  eomputation 

Input:  real  numbers  Nl,  N2,  Nx  of  a  sample  population,  or  in  N3, 

<x  numbers>  <is  part  of>  <a  sample  population>  . 

Computational  logic:  square  root  of  ((sum  of  (N  -  mean  of  (N) )  squared)) 
divided  by  (x  -  1 ) ) 

Variance  describes  a  sample  population,  or  in  N3: 

<variance>  <describes>  <a  sample  population>  . 

Output:  result  of  the  computational  logic  in  the  form  of  a  real  number  N 

(similar  semantics  for  the  standard  deviation) 

4.  Curve  classifieation/categorization.  There  are  many  curve  elassifieation 
types.  A  curve  can  be  matched  against  a  template.  Alternatively,  we  can 
symbolically  describe  the  eurve  trends.  For  example,  for  a  monotonically 
inereasing  eurve,  the  semanties  is  as  follows. 

Input:  real  numbers  Yl,  Y2,  ...,  Yn  of  the  Y-axis  of  a  curve,  or  in  N3, 

<n  numbers>  <is  part  of>  <Y-axis  of  a  curve>  . 

real  numbers  XI,  X2,  ...,  Xn  of  the  X-axis  of  a  curve,  or  in  N3, 

<n  numbers>  <is  part  of>  <X-axis  of  a  curve>  . 

Computational  logic:  for  all  (X2  >=  XI),  (Y2  >=  Yl)  must  be  true. 

"Monotonically  increasing"  describes  a  curve,  or  in  N3: 

<monotonically  increasing>  <describes>  <a  curve>  . 

Output:  result  of  the  computational  logic  in  the  form  of  Boolean  values  of  whether 
the  curve  is  monotonically  increasing. 

5.  Peak  elassifieation 

Input:  real  numbers  Yl,  Y2,  ...,  Yn  of  the  Y-axis  of  a  curve,  or  in  N3, 

<n  numbers>  <is  part  of>  <Y-axis  of  a  curve>  . 

real  numbers  XI,  X2,  ...,  Xn  of  the  X-axis  of  a  curve,  or  in  N3, 

<n  numbers>  <is  part  of>  <X-axis  of  a  curve>  . 

Computational  logic:  find  Ymax  such  that  all  X, 

(Ymax  >=  all  Y  and  Ymax  >  most  Y)  must  be  true. 

(Ymax  >  average  Y  for  Xmax-Delta  <  X  <  Xmax+Delta, 

where  Delta  is  a  number  delineating  the  closest  neighbors  of  Xmax 
"Peak"  describes  a  curve,  or  in  N3: 

<peak>  <describes>  <a  curve>  . 

Output:  result  of  the  computational  logic  in  the  form  of  (X,  Y)  coordinate  pairs 
denoting  where  the  peak  is.  This  assumes  the  outliers  have  been 
previously  filtered  out.  The  computation  logic  above  checks  the  closest 
neighboring  points  to  the  peak  to  see  if  them  are  higher  than  average  to 
guard  against  outliers. 

6.  Value  range  elassifieation 

Input:  real  numbers  Nl,  N2,  ...,  Nx  sample  population,  or  in  N3, 

<x  numbers>  <is  part  of>  <a  sample  population>  . 
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two  numbers  denoting  the  range  [A,  B] ,  where  A  is  the  lower  range 
Computational  logic:  all  N  >=  A  and  all  N  <=  B 

Range  describes  a  sample  population,  or  in  N3: 

<range>  <describes>  <a  sample  population>  . 

Output:  result  of  the  computational  logic  in  the  form  of  a  Boolean  value  denoting 
whether  the  numbers  satisfy  the  range. 

The  Alert  itself  has  many  different  types  of  alerts,  including 

1 .  Value-too-high  and  value -too-low  alerts  for  bound  checking 

Input:  real  number  N  describing  a  sample  population,  or  in  N3, 

<a  number>  <describes>  <a  sample  population>  . 
two  numbers  denoting  the  bounds  [A,  B] ,  where  A  is  the  lower  bound 
Computational  logic:  N  >=  A  and  N  <=  B,  gives  out  the  "Normal"  alert 
N  <  A  gives  out  the  "Value-too-low"  alert 
N  >  B  gives  out  the  "Value-too-high"  alert 
Value-too-high  alert  describes  a  number,  or  in  N3,  with  respect 
to  empirical  bounds: 

<value-too-high  alert>  <describes>  <a  number>  . 

<value-too-high  alert>  <measured  against>  <empirical  bounds>  . 
(similarly  for  value-too-low  and  normal  alerts) 

Output:  result  of  the  computational  logic  in  the  form  of  alerts  showing  whether 
the  number  is  above,  below,  or  within  the  bounds. 

2.  Mean-different  alerts  for  mean  comparison 

Input:  real  number  N  describing  a  sample  population,  or  in  N3, 

<a  number>  <describes>  <a  sample  population>  . 
the  number  M  denoting  the  empirical  mean 

the  tolerance  E  denoting  the  amount  of  difference  that  can  be  tolerated 
Computational  logic:  M-E  <=  N  and  N  <=  M+E,  gives  out  the  "Same"  alert 
otherwise  gives  out  "Different"  alert 
"Same"  alert  describes  a  number,  or  in  N3: 

<same  alert>  <describes>  <a  number>  . 

<same  alert>  <measured  against>  <empirical  mean>  . 

(similarly  for  the  "different"  alert) . 

Output:  result  of  the  computational  logic  in  the  form  of  alerts  showing  whether 
the  number  matches  the  empirical  mean  or  not. 

3.  Variance-different  alerts  for  variance  comparison 

Input:  real  number  N  describing  a  sample  population,  or  in  N3, 

<a  number>  <describes>  <a  sample  population>  . 
the  number  V  denoting  the  empirical  variance 

the  tolerance  E  denoting  the  amount  of  difference  that  can  be  tolerated 
Computational  logic:  V-E  <=  N  and  N  <=  V+E,  gives  out  the  "Same"  alert 
otherwise  gives  out  "Different"  alert 
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"Same"  alert  describes  a  number,  or  in  N3: 

<same  alert>  <describes>  <a  number>  . 

<same  alert>  <measured  against>  <empirical  variance>  . 

(similarly  for  the  "different"  alert) . 

Output:  result  of  the  computational  logic  in  the  form  of  alerts  showing  whether 
the  number  matches  the  empirical  variance  or  not. 


4.  Value  range  mismatch  alerts  for  value  range  classification 

Input:  real  numbers  A,  B  describing  a  sample  population,  or  in  N3, 

<range  numbers>  <describes>  <a  sample  population>  . 
the  numbers  MIN,  MAX  denoting  the  empirical  range 

the  tolerance  E  denoting  the  amount  of  difference  that  can  be  tolerated 
Computational  logic:  if  MIN-E  <=  A  and  A  <=  MIN+E  and  MAX-E  <=  B  <=  MAX+E, 
gives  out  the  "Same"  alert 
otherwise  gives  out  "Different"  alert 
"Same"  alert  describes  a  range,  or  in  N3: 

<same  alert>  <describes>  <a  range>  . 

<same  alert>  <measured  against>  <empirical  range>  . 

(similarly  for  the  "different"  alert) . 

Output:  result  of  the  computational  logic  in  the  form  of  alerts  showing  whether 
the  range  numbers  matched  the  empirical  range  or  not. 

5.  Peak  mismatch  alerts  for  peak  classification 

Input:  real  number  (X,  Y)  describing  a  curve  peak,  or  in  N3, 

<a  coordinate>  <describes>  <a  curve  peak>  . 
the  coordinate  (U,  V)  denoting  the  empirical  peak 

the  tolerances  EX  denoting  the  amount  of  difference  in  the  X-axis,  and 
EY  denoting  the  difference  in  the  Y-AXIS  that  can  be  tolerated 
Computational  logic:  if  U-EX  <=  X  and  X  <=  U+EX  and 

V-EY  <=  Y  and  Y  <=  V+EY,  gives  out  the  "Match"  alert 
otherwise  gives  out  "Mismatch"  alert 
"Match"  alert  describes  a  number,  or  in  N3: 

<same  alert>  <describes>  <a  coordinate>  . 

<same  alert>  <measured  against>  <empirical  peak>  . 

(similarly  for  the  "mismatch"  alert) . 

Output:  result  of  the  computational  logic  in  the  form  of  alerts  showing  whether 
the  coordinate  matches  the  empirical  peak  or  not. 

6.  Numerous  curve-type  alerts  for  curve  comparison.  This  is  done  by 
template  matching.  More  sophisticated  matching  algorithms  such  as 
classifiers  can  be  employed  too.  For  example,  a  Support  Vector  Machine 
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(SVM)  can  be  employed  to  learn  how  to  elassify  a  set  of  labeled  data 
(Cristianini  and  Shawe-Taylor  2000). 


7.  Peak  relative  differenee.  For  example,  eomparing  the  time  difference 
between  two  peaks. 

8.  Two  simulated  means  braeket  an  empirieal  mean. 

9.  Relative  magnitude  of  curves. 

10.  Other  speeialized  symbolie,  rule-based,  and/or  ontological  categorizations. 
For  example,  semantically  describing  an  interaction  network  as  either 
homogeneous  or  heterogeneous. 

As  shown  above,  the  statistieal  routines  and  the  alerts  are  eneoded  with  an 
augmented  deseription  logie  notation  to  allow  their  use  in  the  Inferenee  Engine 
reasoning.  The  augmented  deseription  logie  adopts  an  approaeh  similar  to  the  Deseription 
Logie  Program,  whieh  inter-operates  rules  and  ontologies  semantieally  and  inferentially. 
The  deseription  logie  is  declarative,  with  the  imperative  routines  tied  to  the  declarations. 

There  are  different  types  of  simulation  data.  As  WIZER  is  a  knowledge-based 
tool,  it  is  ean  flexibly  handle  the  different  types  of  simulation  data.  The  data  types 
inelude: 

1.  Single-simulation-run  output  data:  in  this  case,  WIZER  just  takes  the  output 
values,  eategorize  them,  and  reason  about  the  eategories. 

2.  N-simulation-run  (N>1)  output  data:  in  this  ease,  WIZER  eomputes  the 
probability  of  the  output  values  fall  into  a  category.  In  other  words,  it  eounts  the 
number  of  times  the  outputs  values  fall  into  a  eategory  divided  by  the  total 
number  of  simulation  runs.  WIZER  eould  also  eompute  the  statisties  for  the 
output  data  before  putting  it  into  eategories.  Doing  so  depends  on  the  nature  of  the 
data,  so  eare  must  be  taken  in  whieh  methods  are  applied.  In  general,  eurves 
should  not  be  averaged  but  rates  ean  be  averaged.  Eor  example,  eurves  of  doetor 
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visits  across  N  simulation  runs  in  a  1  year  simulation  should  not  be  averaged,  but 
the  rates  of  doctor  visits  across  N  simulation  runs  for  1  year  intervals  eould  be 
averaged. 

3.  Longitudinal  data:  an  agent-based  simulator  could  trace  the  history  of  an 
individual  in  the  eourse  of  the  simulation.  In  this  case,  WIZER  eould  put  the  data 
in  longitudinal  eategories  and  reason  about  them. 


4.5.1  Alert  WIZER  as  applied  to  Testbeds 

Two  testbeds  are  used  to  test  validation  automation  eapability  of  WIZER.  The  first  is 
Bio  War,  a  eity-seale  multi-agent  soeial-network  of  weaponized  disease  spread  in  a 
demographically  realistie  population  with  naturally-oeeurring  diseases.  The  seeond 
model  is  CONSTRUCT,  a  model  for  eo-evolution  of  soeial  and  knowledge  networks 
under  diverse  eommunication  seenarios.  Table  2  shows  the  features  of  the  Alert  WIZER 
as  applied  to  the  two  testbeds.  The  features  are  applied  to  the  validation  scenarios  of  the 
two  testbeds  of  BioWar  and  CONSTRUCT  in  Chapters  6  and  7  respeetively. 


Table  2.  Alert  WIZER  as  Applied  to  Bio  War  and  CONSTRUCT 


Features  of  Alert  WIZER 

BioWar 

CONSTRUCT 

Mean  eomparison 

Yes 

No 

Curve  peak  determination 

Yes 

No 

Relative  timing  of  curve 
peaks 

Yes 

No 

Threshold  determination 

No 

Yes 

Simulated  means  bracket 
the  empirieal  mean 

No 

Yes 

Curve  magnitude 
eomparison 

No 

Yes 

Qualitative  eomparison  of 
the  curve  magnitude 
eomparisons 

No 

Yes 

Interaetion  network 
categorization  as 
homogeneous  or 
heterogeneous 

No 

Yes 
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4.6  The  Inference  Engine 


The  WIZER  Inference  Engine  is  based  on  reasoning  on  facts,  rules,  and  causations. 
Causations  describe  the  simulation  code.  Rules  are  tied  to  the  causal  relations  and  to  the 
simulation  entities,  so  that  the  number  of  rules  is  constrained.  This  partially  avoids  the 
computational  complexity  of  rule-based  systems.  The  declarative  causal  and  rule 
relations  are  in  turn  tied  to  the  procedural  simulation  code.  Thus  the  causal  and  rule 
relations  can  operate  on  the  simulation  code  through  inference.  Ontological  reasoning 
utilizing  Simulation  Description  Eogic  augments  the  inference.  Causations  also  describe 
empirical  knowledge.  The  Inference  Engine  incorporates  hypothesis  building  and  testing 
as  a  way  to  explore  knowledge  space,  in  addition  to  rule-based/causation-based 
inferences.  The  simplest  hypothesis  building  method  is  simply  to  search  in  the  empirical 
knowledge’s  causal  graphs. 

Rule-based  probabilistic  argumentation  systems  (Haenni  et  al.  1999),  causal 
analysis  (Pearl  2003,  Spirtes  et  al.  2000),  and  the  Dempster-Shafer  Theory  of  Evidence 
were  early  inspirations  for  the  creation  of  inference  mechanisms  in  WIZER.  All  have 
weaknesses  which  make  them  unsuitable  for  use  in  the  validation  of  simulations.  The 
probabilistic  argumentation  systems  require  a  complete  probabilistic  space  to  function, 
which  is  hard  to  define  for  sociotechnical  problems.  Causal  analysis  makes  the 
assumptions  about  conditional  probability  and  conditional  independence.  It  reduces  the 
problem  of  causality  to  graph  notation,  when  in  the  real  world  causality  is  much  more 
complex.  This  dissertation  suggests  causality  is  best  approached  by  simulating  the 
mechanisms  and  processes  closely. 

The  Inference  Engine  has  components  for  rule  and  causal  clause  operation, 
simulation  knowledge  operation,  domain  knowledge  operation,  validation,  and  model- 
improvement.  Implicit  in  this  are  the  math  and  statistics  routines  employed  to  support  all 
components.  These  support  routines  are  semantically  described  and  can  be  used  by  the 
Inference  Engine  for  reasoning.  The  application  of  rules  is  weighted  by  their  supporting 
evidences,  within  the  context  of  existing  knowledge  and  model.  The  building  of 
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hypotheses  is  based  on  the  diserepaney  between  domain  knowledge,  empirieal  data,  and 
simulation  knowledge. 

The  eausation  elauses  of  the  Inference  Engine  define  the  knowledge  space  of  the 
simulator.  The  rule  clauses  of  the  Inference  Engine  encode  what  variables  should  be 
changed  given  the  alerts.  As  noted  above,  the  rules  are  tied  to  the  causal  relations.  The 
firing  of  rules  follows  a  forward-chaining  method  -  a  data-driven  method.  In  the  forward¬ 
chaining  method,  the  system  compares  data  against  the  conditions  -  the  IE  parts  -  of  the 
rules  and  determines  which  rules  to  fire.  Eigure  6  shows  the  operations  of  the  forward¬ 
chaining  method.  The  forward-chaining  method  represents  production  systems. 
Production  systems  are  Turing-equivalent,  which  means  they  are  as  powerful  as  a  Turing 
machine.  Thus  production  systems  can  do  everything  computable.  WIZER  augments  the 
production  systems  with  ontological  reasoning,  simulation  descriptor,  minimal  model 
perturbation,  and  operations  on  the  computational  or  process  logic  representing  the 
procedural  simulation  code.  The  rules  in  WIZER  have  access  to  ontology,  enabling  them, 
for  example  to  deduce  rules  such  as  “if  a  person  is  a  child  then  he/she  plays  with  another 
child”,  as  the  ontology  for  children  includes  the  attribute  that  they  play  with  each  other. 
Integration  of  ontology  and  rule-based  systems  lends  to  similar  meta-rule  capabilities  of 
high  performance  expert  systems. 
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exit  if  specified  by  rule 

Figure  6,  Forward-Chaining  Method 


As  shown  in  Figure  6,  using  the  rule  base  -  the  knowledge  base  eontaining  the  rules  - 
and  working  memory  -  a  plaee  to  store  faets,  the  system  determines  possible  rules  to  fire. 
These  rules  form  a  eonfiict  set,  where  rules  may  eonflict  with  eaeh  other.  A  eonfiiet 
resolution  strategy  is  then  employed  to  seleet  which  rule  to  fire.  After  the  firing  of  the 
rule,  new  inferred  facts  are  added  to  the  working  memory.  To  prevent  duplicate  firing  of 
the  same  rule,  the  triggering  facts  are  removed  from  working  memory.  If  there  are  no 
more  rules  to  fire,  the  system  stops.  The  rules  in  WIZER  are  not  based  on  heuristics  but 
on  the  model,  as  they  are  tied  to  the  simulation  model.  The  conflict  resolution  strategy  in 
WIZER  includes  the  task  of  selecting  rules  based  on  the  result  of  forward-chaining 
inference  and  also  the  task  of  determining  what  value/rule  to  change  and  how  much  to 
change  based  on  the  minimal  perturbations  to  the  model  to  fit  the  simulation  and 
inference  results.  The  latter  is  a  feature  of  model-based  reasoning  and  is  implemented 
using  ontological  and  rule-based  reasoning  in  WIZER.  Thus  in  WIZER  the  rules  have  a 
supporting  role  of  pinpointing  areas  to  change,  while  the  actual  change  to  the  value  or 
rule,  and  the  amount  of  this  change,  is  determined  by  model  perturbations  using 
knowledge-based  and  ontological  reasoning. 
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WIZER  stores  the  history  of  confliot  resolutions,  model  perturbations,  and 
simulation  trials.  This  history  allows  WIZER  to  avoid  testing  the  same  set  of  parameters 
twice.  Avoiding  the  same  simulation  twice  is  a  primitive  form  of  simulation  control. 

The  knowledge  and  simulation  space  operations  are  based  on  knowledge  and 
ontological  reasoning.  The  operations  are  similar  to  the  forward-chaining  method  above 
but  with  a  label  added  to  the  rules  denoting  the  type  of  relationships/edges  between 
entities. 

Production  systems  have  the  following  advantages: 

o  They  are  plausible  psychologically. 

o  They  are  ideal  for  homogeneous  knowledge  representation  in  the  form  of 
rules. 

o  They  can  be  highly  modular  as  each  rule  is  theoretically  independent. 

o  They  allow  incremental  rule  growth  if  modularity  is  maintained. 

o  They  allow  well-defined  and  almost  unrestricted  communication  between 
pieces  of  knowledge. 

o  They  are  a  natural  way  to  represent  many  kinds  of  knowledge. 

Production  systems  are  not  without  disadvantages  however;  WIZER  ameliorates  some  of 
the  disadvantages: 

o  Inefficient:  production  systems  are  inefficient  due  to  the  explosion  in  number 
of  rules  and  inferences.  WIZER  however  avoids  the  explosions  of  rules  and 
assertions  by  tying  them  with  the  causal  relations  and  thus  the  simulation 
model.  Moreover,  the  match  complexity  of  rule  clauses  is  ameliorated  by  the 
organization  of  facts  and  rules  using  ontology  and  causal  constraints.  These 
constraints  allow  ontology-based  modularization  of  rules  and  facts.  This  is 
similar  to  the  RETE  algorithm  (Eorgy  1982),  but  based  on  ontology. 

o  Opaqueness  and  unexpected  interaction:  It  is  very  hard  to  figure  out  the 
effects  of  adding  a  rule  and  interdependencies.  WIZER  reduces  this  problem 
by  using  causal  relations  and  ties  to  the  simulation  model.  Eurthermore,  as  the 
simulation  system  is  tied  to  the  rules,  the  resulting  interaction  of  changed  rules 
can  be  tested  by  running  the  simulation.  This  is  asssited  by  using  the  ontology 
in  the  form  of  Simulation  Description  Eogic. 
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o  Difficult  to  design;  WIZER  simplifies  design  by  tying  the  rule  design  to  the 
eausal  relation  design  (whieh  is  easier)  and  to  the  simulation  modeling  (whieh 
should  be  eoneeptually  elear).  Thus  it  should  be  easier  to  use  a  sequenee:  first, 
designing  the  simulation  model,  then  eausal  relations,  and  finally  if-then  rules. 

o  Diffieult  to  debug:  as  rules  are  tied  to  eausal  relations,  debugging  the  rules 
should  be  easier  as  they  are  modularized  by  their  eausal  relations.  The  eausal 
relations  themselves  should  also  be  easier  to  debug  as  they  are  tied  to  the 
simulation  model  and  to  the  meehanisms  in  the  form  of  proeedural  simulation 
pseudoeode. 

o  Is  knowledge  really  rule-based?  At  the  deepest  level  of  human  knowledge 
(wisdom,  learning,  and  ereativity),  no.  However,  many  forms  of  knowledge 
ean  be  expressed  as  rules.  Furthermore,  however  superfieial  the  rules  may  be, 
they  may  be  able  to  get  the  job  done,  as  evideneed  in  the  rule-based  justiee 
systems  and  some  finaneial/business  operations.  This  dissertation  argues  that 
knowledge  is  really  proeess-based,  and  ean  thus  be  simulated  elosely. 
(Simulation  is  one  of  the  best  tools  to  mirror  proeesses;  another  important  tool 
is  mathematies). 


4.6.1  Variable,  Rule,  and  Causation  Definition 

The  variables  in  WIZER  ean  have  values  whieh  are  Boolean,  integer,  real,  eurve  (an 
array  of  real  numbers),  or  symbolie.  Additionally,  they  have  the  upper  and  lower  limits  of 
the  value  when  applieable.  Eaeh  variable  also  has  fields  for  name  and  attributes  sueh  as 
belief,  alert,  and  ehangeable.  The  variables  eorrespond  to  the  nodes  in  the  graph  of  the 
simulation  knowledge  spaee  or  the  empirieal  knowledge  spaee.  In  essenee,  a  variable  has 
the  following  fields: 

Variable  =  <name,  value,  alert,  belief,  ehangeable,  priority,  is_inferred_or_not> 
The  attributes  of  a  variable  have  the  following  meanings: 

o  Belief:  the  probability  of  the  variable  value  being  eorreet. 
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o  Alert:  the  symbolie  value  denoting  qualitative  “alerts”  or  elassifieations  of 
data. 

o  Changeable:  the  degree  to  which  a  variable  value  can  be  changed, 
o  Priority:  the  experiment  control  value  to  prioritize  probing  of  variables, 
o  Is_inferred_or_not:  a  Boolean  value  denoting  whether  the  value  of  a  variable 
is  empirical  or  inferred. 

The  rules  are  defined  as  follows.  Each  statement  of  rules  has  the  left-hand  side 
(LHS)  variables  and  the  right-hand  side  (RHS)  variables.  It  has  an  indicator  for  whether 
the  variables  are  in  conjunctive  normal  form  (CNF)  or  disjunctive  normal  form  (DNF). 
The  default  in  WIZER  is  CNF.  It  has  fields  for  name,  label,  and  the  attributes  of  belief, 
changeable,  priority,  and  an  indicator  for  whether  or  not  this  rule/causation  is  inferred.  In 
essence. 

Rule  =  <name,  label,  FHS,  RHS,  belief,  changeable,  priority,  is_inferred_or_not> 
The  implication  format  of  the  rule  is  FHS  RHS,  where  the  arrow  sign  denotes  the 
implication.  It  also  has  the  label  “if-then”  denoting  that  this  rule  belongs  to  the  if-then 
rule  type.  Furthermore,  each  rule  has  ties  to  the  ontology  and  to  the  causal  relations  and 
the  simulation  model.  The  attributes  of  the  “if-then”  rules  are: 
o  Belief:  the  probability  of  the  rule  being  correct, 
o  Fabel:  the  type  of  relations,  in  this  case,  “if-then”, 
o  Changeable:  the  degree  to  which  a  rule  can  be  changed, 
o  Priority:  the  experiment  control  value  to  prioritize  probing  of  rules, 
o  Is  inferred  or  not:  a  Boolean  value  denoting  whether  the  rule  is  inferred.  If 
the  rule  is  not  inferred,  it  is  either  empirical  or  based  on  a  model. 

The  causal  rules  are  defined  similar  to  the  definition  of  “if-then”  rules.  In  essence, 
causal  relations  have  the  following  fields: 

Causation  =  <name,  label,  FHS,  RHS,  belief,  changeable,  priority, 
is_inferred_or_not> 

The  implication  format  of  the  causation  is  FHS  RHS,  with  the  “causation”  label 
denoting  this  implication  as  being  the  causation  type.  Other  labels  include  “correlation”, 
“if-then”,  and  “convertible”.  These  correspond  to  the  edges  of  the  graph  of  simulation 
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knowledge  spaee  and  empirical  knowledge  space.  The  attributes  of  the  causation  rules 


are: 

o  Belief:  the  probability  of  the  causal  relation  being  correct, 
o  Label:  the  type  of  relations,  in  this  case,  “causation”, 
o  Changeable:  the  degree  to  which  a  causal  relation  can  be  changed, 
o  Priority:  the  experiment  control  value  to  prioritize  probing  of  causations, 
o  Is  inferred  or  not:  a  Boolean  value  denoting  whether  the  rule  is  inferred.  If 
the  rule  is  not  inferred,  it  is  either  empirical  or  based  on  a  model. 

The  “belief’  and  “changeable”  attributes  are  similar,  but  denote  two  different 
things.  Even  if  a  causal  relation  is  100%  correct,  it  can  still  be  changed.  Inferred  causal 
rules  are  weaker  than  either  empirical  or  model-based  causal  ones,  thus  another  attribute 
“is  inferred  or  not”  is  included  to  note  the  difference. 


4.6.2  Conflict  Resolution  Strategy 

WIZER  utilizes  the  conflict  resolution  strategies  as  follows: 

1.  Semantics  based  conflict  resolution  when  the  information  is  available.  This  is 
based  on  the  Simulation  Description  Eogic  and  its  inference. 

2.  Absent  the  ontologicaEsemantics  information,  the  conflict  is  resolved  by 
numerical  weighting  of  the  <belief,  changeable,  priority,  is_inferred_or_not> 
factors. 

3.  The  combination  of  the  above  two,  by  reasoning  about  the  numerical  weighting 
factors. 

Compounding  variables  (variables  that  must  be  simultaneously  on,  off,  or  have  certain 
values)  are  resolved  based  on  the  perturbation  of  system  description  contained  in  the 
simulation  knowledge  space  (and  its  ontology).  This  includes  Boolean  operations  of 
AND,  Inclusive-OR,  Exclusive-OR,  and  NOT.  Eor  real  values,  AND  corresponds  to 
positive  correlation,  OR  to  choices,  and  NOT  to  negative  causation  or  non-existence. 
Additionally,  the  value  and  link/model  adjustment  is  considered  during  the  perturbation 
of  system  description  using  knowledge-based  and  ontological  reasoning.  The  policy 
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question  at  hand  can  constraint  both  the  conflict  resolution  strategy  and  the  value  and 
link/model  adjustment.  All  these  knowledge  are  encoded  in  knowledge  bases  and 
ontology. 

As  an  example,  suppose  that  the  following  causal  model  is  defined  for  a 
simulation  routine,  written  in  N3: 

<ailment-effective-radius>  <causes>  <infection-rate>  . 
<ailment-exchange-proximity-threshold>  <causes>  <infection-rate>  . 

<base-rate>  <causes>  <infection-rate>  . 

<vaccination-rate>  <hinders>  <infection-rate>  . 

<vaccination-rate>  <hinders>  <base-rate>  . 

<infection-rate>  <is  convertible  with>  <incidence-rate>  . 
where  the  verb  “causes”  means  to  influence  positively,  “hinders”  means  to  influence 
negatively,  “is  convertible  with”  means  the  value  can  be  transformed  mathematically. 
The  ailment-effective-radius  variable  denotes  the  radius  within  which  people  can  get 
infected  with  the  initial  release  of  a  disease  agent.  The  ailment-exchange-proximity- 
threshold  variable  denotes  the  distance  within  which  a  person  can  infect  another  with  a 
disease  agent.  For  infectious  and  communicable  diseases  like  influenza  the  two  factors 
are  closely  correlated.  The  base-rate  denotes  the  percentage  of  people  who  are  susceptible 
to  the  disease  agent. 

The  if-then  rules  related  to  the  above  causal  model  are  (for  the  case  of  infection 
rate  being  above  limit  only  for  simplicity): 

<infection-rate  is  above  hmit>  <if-then>  <op-lower  aliment-effective-radius>  . 
<infection-rate  is  above  limit>  <if-then> 

<op-lower  ailment-exchange-proximity-threshold>  . 

<infection-rate  is  above  limit>  <if-then>  <op-lower  base-rate>  . 

<infection-rate  is  above  limit>  <if-then>  <op-higher  vaccination-rate>  . 
<incidence-rate  is  above  hmit>  <if-then>  <infection-rate  is  above  limit>  . 
where  the  prefix  “op”  denotes  the  operation  on  parameter/node  values.  Suppose  now  that 
the  Alert  module  gives  out  the  fact  that  <incidence-rate  is  above  hmit>  from  a 
comparison  between  simulation  outputs  and  the  empirical  data.  The  inference  then  gives 
a  notice  that  <infection-rate  is  above  limit>  and  all  the  four  rules  have  their  LHS  to  be 
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true  and  are  entered  into  the  eonflict  set.  Assume  the  following  knowledge  exists  for  the 
simulation: 

<vaooination>  <exists>  <false>  . 

<vaooination>  <oauses>  <vaooination-rate>  . 

<ailment-effeotive-radius>  <positively  eorrelates  with> 
<ailment-exohange-proximity-threshold>  . 

The  Inferenee  Engine  deduees  that 

<vaceination-rate>  <exists>  <false>  . 
and  thus  the  rule 

<infeetion-rate  is  above  limit>  <if-then>  <op-higher  vaecination-rate>  . 
eannot  fire  (and  not  entered  into  the  fire  set,  that  is,  the  set  of  rules  to  fire).  The  other 
rules  to  fire  inelude: 

<infeetion-rate  is  above  hmit>  <if-then>  <op-lower  aliment-effeetive-radius>  . 
<infeetion-rate  is  above  limit>  <if-then> 

<op-lower  ailment-exehange-proximity-threshold>  . 

<infeetion-rate  is  above  limit>  <if-then>  <op-lower  base-rate>  . 

The  first  two  rules  should  fire  together  due  to  their  eorrelation  if  it  is  so  specified,  while 
the  last  one  can  fire  independently.  But  in  this  case,  as  there  are  no  facts  preventing  them 
to  fire,  all  will  fire. 

The  amount  of  change  to  the  value  if  the  “op”  prefixed  clause  is  true  is 
determined  by  existing  knowledge  if  available,  or  else  by  a  simple  divide-and-conquer 
probe.  The  divide-and-conquer  probe  looks  at  the  extreme  values  first  than  the  midway 
values  and  the  midway  of  the  midway  values  and  so  on.  It  is  similar  to  binary  search. 
Suppose  here  we  have  the  following  knowledge: 

<ailment-effective-radius>  <has-values-ol^  <10  m,  50  m,  100  m>  . 
<ailment-exchange-proximity-threshold>  <has-values-of>  <10  m,  50  m,  100  m>  . 
<base-rate>  <has-values-of>  <0.10,  0.30,  0.50>  . 

Suppose  again  that  the  current  values  are  10  m  for  both  ailment-effective-radius  and 
ailment-exchange-proximity-radius  and  10%  for  base-rate.  The  adjustments  are  decided 
by  the  Inference  Engine  to  be:  50  m  for  both  ailment-effective-radius  and  ailment- 
exchange-proximity-radius  and  30%  for  base-rate.  When  more  sophisticated  knowledge 
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for  deciding  whether  and  how  to  change  the  base-rate  value  is  available  (e.g.,  taking  into 
account  people  who  are  in  the  time  and  place  of  the  disease  agent  release  and  their 
immunity  status),  more  detailed  probes  into  how  base-rate  should  change  are  made 
feasible.  Using  knowledge  inference  in  this  way,  combinatorial  explosion  of  parameter 
values  can  be  somewhat  tamed.  This  is  similar  to  the  approach  taken  in  policy  analysis  in 
which  all  options  are  carefully  thought  over  so  that  no  brute  force  search  is  necessary. 


4.6.3  Value  and  Link/Model  Adjustment 

The  amount  of  change  to  a  parameter  value  is  determined  by  the  alert  type  and  the 
knowledge  and  ontological  inference  for  that  parameter.  If  a  continuous  and  dynamic 
relationship  is  known,  the  adjustment  amount  may  be  derived  from  differential  equations. 

For  a  link  or  model  adjustment,  the  model  is  perturbed  minimally  to  get  the 
change  for  the  next  simulation  to  fit  the  empirical  data  better.  The  perturbations  proceed 
minimally  from  the  least  to  the  most  perturbations:  change  in  the  parameter  values, 
change  in  causation  links,  and  change  in  the  meta-models.  How  the  model  is  perturbed  is 
based  on  the  knowledge-base  and  ontology  about  the  model,  including  the  assumptions 
about  the  model.  The  procedural  code  of  the  model  is  described  by  Simulation 
Description  Logic  and  can  be  used  to  help  determine  the  changes. 

The  conflict  resolution  and  the  value/rule  adjustment  are  implemented  in  one 
module.  This  module  has  a  supporting  knowledge  base  for  the  purpose  of  model 
perturbations  and  conflict  resolution. 

The  nature  of  the  physical  and  social  world  can  sometimes  help  in  value 
adjustment.  For  example,  in  the  physical  world  the  road  network  restricts  part  of  human 
mobility.  Adjustments  in  the  values  of  human  mobility  may  only  be  needed  along  the 
transportation  networks,  for  the  first  approximation.  No  explosive  search  in  a  two 
dimensional  cellular  space  is  needed.  In  the  social  world,  the  mobility  of  children  is 
restricted  during  the  school  hours.  Zoning  laws  affect  the  adult  mobility  patterns.  These 
constraints  of  the  physical  and  social  world  can  be  best  captured  by  ontology  and 
knowledge  bases.  As  an  example,  a  rough  ontology  for  children  can  be  written  in  N3  as 
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<children>  <goes  to>  <school>  . 

<children>  <rides>  <a  school  bus>  . 

<children>  <does  not  go  to>  <pharmacy>  . 

<children>  <has>  <curfew>  . 

<children>  <has>  <parents>  . 

<children>  <lives  with>  <parents>  . 

<children>  <plays  with>  <children>  . 

Of  course,  these  relationships  are  not  fixed.  Speeialized  ontologies  may  be  ereated  for 
more  detail  deseriptions  of  ehildren  with  different  baekgrounds  and  age. 

In  addition  to  curves,  surfaces,  volumes,  and  higher  dimensional  manifolds  must 
sometimes  be  probed.  Aided  by  knowledge-bases  and  ontology,  WIZER  ean  probe  the 
eritieal  points,  sample  the  area  around  the  points,  and  use  the  results  to  guide  further 
exploration  in  the  manifolds.  Absent  knowledge,  WIZER’ s  performanee  degenerates  to 
that  of  numerieal  teehniques  sueh  as  Monte-Carlo  and  divide-and-eonquer  seareh. 


4.7  Domain  Knowledge  Operations 


Domain  knowledge  is  represented  as  a  graph  of  knowledge  struetures  along  with  its  node 
value  ranges.  It  eneodes  both  the  eoneeptual  layer  (the  T-Box)  and  the  instantiation  layer 
(the  A-Box)  when  data  is  available.  The  node  has  attributes  denoting  properties  sueh  as 
the  attribute  has-type-of  whieh  defines  the  possible  types  of  the  node  or  the  adjeetive  of 
the  node  “noun”.  There  are  different  kinds  of  edges  in  the  graph  or  network: 
o  Causation:  eneoding  what  eauses  what. 

o  If-Then:  eneoding  what  will  infer  what.  Note  that  inferenee  is  not  the  same  as 
eausation. 

o  Convertible:  eneoding  what  ean  be  transformed  mathematieally  to  what, 
o  Correlation:  eneoding  what  eorrelates  with  what. 
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o  Instance-of:  encoding  that  this  node  is  an  instantiation  of  a  conceptually 
higher  node. 

o  Concept-of:  encoding  this  node  is  the  conceptual  “parent”  of  another  node. 

o  Other  relationships  or  “verbs 

The  operations  on  domain  knowledge  are  based  on  the  Simulation  Description  Logic. 
The  knowledge  can  be  described  in  the  modified  N3  notation. 

The  operations  are  implemented  in  a  graph  search  algorithm  with  edge  labels 
denoting  the  graph/network  attributes,  which  is  to  say,  the  values  of  the  edges  of  the 
graph.  The  forward  chaining  inference  rules  correspond  to  the  graph  searches.  The  rules 
have  access  and  control  to  the  graph  searches.  They  become  different  kinds  of  rules 
depending  on  the  attribute  label.  If  the  label  has  the  attribute  of  “causation”,  it  becomes 
the  causation-type  production  rule.  If  it  has  the  attribute  of  “correlation”,  it  becomes  the 
correlation-type  production  rule.  The  rules  mentioned  here  are  to  be  understood  in  the 
context  of  production  systems. 

The  queries  that  can  be  answered  by  the  operations  include  “what  properties  a 
node  has”,  “find  what  class  a  node  belongs  to”,  “find  all  nodes  that  are  part  of  this  node”, 
“find  all  nodes  that  correlate  with  this  node”,  “find  all  nodes  that  influence  this  node”, 
etc.  The  answers  are  found  by  searching  the  graph  and  controlled  by  rules. 

As  an  example,  suppose  that  the  management  node  has  the  type  property  value  of 
either  homogeneous  or  heterogeneous.  The  search  performed  to  answer  the  query  “find 
all  types  of  the  node  management’’  includes  the  localization  of  the  node  and  probing  of 
the  values  of  the  node’s  type  attribute.  This  corresponds  to  the  conceptual  definition  (the 
T-Box)  of  management.  If  a  specific  data  is  available,  the  instantiation  definition  (the  A- 
Box)  of  a  specific  management  is  feasible.  This  is  an  example  of  ontological  reasoning  in 
WIZER.  The  result  of  ontological  reasoning  is  new  values  for  Inference  Engine  rules. 
The  new  values  can  be  used  to  perform  what- if  analyses  as  they  are  new  hypotheses. 
Eurthermore,  a  rule  such  as  “if  the  management  is  homogeneous,  then  probe  other  types 
of  management”  can  trigger  ontological  reasoning.  Rules  have  access  to  the  ontology  for 
reasoning. 
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4.8  Simulation  Knowledge  Operations 


Simulation  knowledge  is  also  struetured  as  a  graph  similar  to  the  domain  knowledge 
graph.  It  eontains  simulated  or  inferred  output  values  and  elauses.  The  nodes  of  the  graph 
denote  variables.  The  edges  of  the  graph  denote  relationships  between  variables.  The 
nodes  have  slots  denoting  properties.  The  node  is  analogous  with  the  noun  in  the  English 
language,  the  edge  analogous  with  the  verb,  the  node  attribute  with  the  adjeetive.  The 
deseription  of  an  edge,  the  adverb,  while  feasible  is  not  implemented.  The  relationships 
implemented  are  eausal,  if-then,  and  eonvertible  relationships. 

The  operations  on  simulation  knowledge  are  based  on  the  Simulation  Deseription 
Logic.  They  use  a  graph  search  algorithm  like  the  one  used  in  the  domain  knowledge 
operation.  The  graph  search  corresponds  to  a  forward  chaining  inference.  Simulation 
knowledge  operations  are  tied  to  the  simulation  model,  whereas  the  domain  knowledge 
does  not  necessarily  have  direct  ties  to  the  simulation  model. 

The  reason  why  there  is  a  separation  between  simulation  and  domain  knowledge 
is  that  the  simulation  knowledge  is  owned  by  the  simulation  developers  whereas  the 
domain  knowledge  belongs  to  the  validators  or  VV&A  practitioners  which  they  use  as  a 
standard  to  judge  simulators  against. 


4.9  Validation  Submodule 


The  validation  submodule  computes  the  degrees  of  validity  of  a  simulator.  As  validation 
depends  on  domain  knowledge,  validation  is  measured  against  a  definite  piece  of 
knowledge.  For  example,  a  simulator  may  output  valid  behaviors  in  terms  of  school 
absenteeism,  while  not  necessarily  in  terms  of  other  pieces  of  knowledge.  Thus  every 
validation  is  measured  based  on  a  specified  piece  of  knowledge  underlying  a  data  stream 
of  simulation  outputs  and  occurrences. 


92 


On  the  conceptual  model  level,  it  takes  the  domain  knowledge  graph  and 
simulated  knowledge  graph  as  input  to  calculate  the  intersection  between  them.  The 
extent  of  the  intersection  determines  the  degree  of  validity  of  the  simulator  model.  This 
assumes  domain  knowledge  and  empirical  data  are  correct. 

Thus,  there  are  two  measures  of  validation;  one  for  data/outputs/behaviors 
validity  and  the  other  for  conceptual  validity. 


4.10  Model-Improvement  Submodule 


This  submodule  takes  the  intersection  of  domain  and  simulated  knowledge  graphs  and 
calculates  the  changes  needed  in  the  simulated  knowledge  graph  (and  thus  the  simulation 
parameters  and  meta-models)  needed  to  better  align  the  two.  Better  alignment  gives 
better  validity.  A  simple  method  for  alignment  is  to  search  in  the  domain  knowledge 
space  to  find  links  that  do  not  exist  in  the  simulation  knowledge  space.  Guided  by 
ontology  and  semantics,  these  new  links  or  modified  ones  are  then  added  to  the 
simulation  knowledge  space.  The  simulation  links  can  also  be  deleted  when  warranted. 
The  above  alignment  method  is  similar  to  morphing.  This  module  performs  hypothesis 
building  for  a  more  advanced  morphing.  After  the  addition  of  the  links,  simulations  are 
run  to  test  the  links. 
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4.11  Ontological  Reasoning  in  WIZER 


Triggered  by  rules  or  scenario  plans,  WIZER  performs  ontological  reasoning.  The 
ontological  reasoning  is  implemented  in  the  form  of  a  graph  search  with  its 
corresponding  semantic  description.  The  graph  has  nodes,  node  attributes,  and  edges.  The 
simulation  knowledge  space  and  the  domain  knowledge  space  have  this  graph 
representation.  Rules  in  the  Inference  Engine  have  access  to  the  graph  and  thus  the 
ontology  for  reasoning.  Graph  searches  correspond  to  rule  inferences. 

WIZER  enables  the  Inference  Engine  to  have  rules  such  as  “find  all  nodes  that 
correlate  with  this  node”  and  then  “for  all  correlated  nodes,  run  test  T\  Thus  the  rule 
inference  and  ontological  reasoning  support  each  other. 


4.12  Structural  Changes  and  WIZER 


The  currently  implemented  WIZER  deals  primarily  with  parameter  value  changes,  not  the 
(simulation)  structural  changes.  While  there  is  a  simple  structural  change  handled  by 
WIZER  for  the  CONSTRUCT  Validation  III  in  Chapter  7  (the  heterogeneous  workers 
heterogeneous  management  case),  this  dissertation  does  not  show  more  comprehensive 
results  of  structural  changes  enabled  by  WIZER.  The  simple  structural  change  of  the 
CONSTRUCT  Validation  III  is  in  the  form  of  ontological  reasoning:  changing  the  type  of 
management  from  homogeneous  to  heterogeneous. 

In  the  conflict  resolution  and  the  value/link  adjustment  process  of  WIZER, 
minimal  model  perturbations  are  performed  by  changing  causal  relations.  How  causal 
relations  are  changed  depends  on  domain  knowledge,  simulation  knowledge,  empirical 
data,  simulation  results,  and  ontology. 

Erom  knowledge-based  perspective,  parameter  values  and  links/edges  are  both  a 
piece  of  knowledge.  The  distinction  is  that  a  parameter  value  directly  influences  one 
node,  while  a  link/edge  directly  influences  two  nodes.  But  at  the  knowledge  level,  this 
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distinction  does  not  really  matter.  All  that  matter  is  that  determining  the  effeets  of  the 
ehange  in  a  pieee  of  knowledge  on  the  simulation  when  sueh  ehange  is  made.  In  the 
eontext  of  validation,  if  the  ehange  results  in  better  validity,  then  that  pieee  of  knowledge 
(regardless  of  whether  it  is  a  parameter  value  or  a  link/edge)  is  good  and  the  ehange  is 
good.  In  the  eontext  of  model  improvement,  if  the  ehange  in  the  pieee  of  knowledge 
results  in  a  better  model,  that  that  ehange  is  good,  regardless  of  whether  it  is  ehanging 
simple  parameter  values,  links/edges,  or  even  more  eomplex  meta-models.  The  pieees  of 
knowledge  and  their  relationships  are  eneoded  in  the  eausal  relations,  “if-then”  rules, 
proeess  model/logie,  and  other  types  of  inferenee  in  the  knowledge-based  systems. 
Parameter  values  and  links/edges/rules  are  both  first-elass  objeets  (roughly  speaking,  they 
are  equals).  Based  on  the  above  knowledge  level  view,  if  WIZER  ean  handle  parameter 
value  ehange  well,  it  must  be  able  to  handle  link/edge  ehange  well  too.  This  is  not  a 
eoneeptual  problem  nor  a  software  design  problem,  but  simply  a  matter  of  programming. 


4.13  An  Example  of  Simple  WIZER  Runs 


This  example  is  based  on  four  runs  of  BioWar  simulator  for  100%-soale  Hampton  eity 
(population  146,437  persons)  with  no  biologieal  attaeks  (i.e.,  intentional  disease 
releases). 

FIRST  ITERATION 

•  Alert  outputs: 

•  ER  registration  is  above  the  empirieal  bound  of  0.232  visits  per  person  per 
year 

edregistration-yearly-avg,2.24856, 0.056, 0.232, above  the  bounds 

•  Doetor  visit  is  above  the  empirieal  bound  of  1 .61 1  visits  per  person  per 
year 

insuraneeelaim-yearly-avg,3. 16572,0. 415,1. 61  l,above  the  bounds 

•  Sehool  absenteeism  is  within  the  empirieal  bound  of  absenee  rate 

sehool-yearly-avg,3 .62525 ,3 .04,5 . 1 8, within  bounds 

•  Inferenee  Engine  outputs: 

•  Inerease  the  behavior  threshold  for  going  to  ER  (by  a  eonstant  amount) 

•  Inerease  the  behavior  threshold  for  going  to  doetor  offiee 
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•  Leave  alone  the  behavior  threshold  related  to  sehool  going  behavior 

•  3  data  streams  with  1  data  stream  within  bounds  -^33%  validity 
SECOND  INTERATION 

•  ER  registration  is  within  the  empirieal  bounds 

edregistration-yearly-avg,0.08 18865, 0.056, 0.232, within  bounds. 

•  3  data  streams  with  2  data  streams  within  bounds  67%  validity 

The  example  eovers  WIZER  submodules  as  follows. 

The  example  eorresponds  to  the  situations  when  agents  get  siek  and  manifest 

symptoms  of  illness.  The  severity  of  symptoms  governs  where  agents  will  go  in  the  next 

time  step.  The  thresholds  of  severity  that  trigger  eertain  movement  behaviors  are  named 

aeeordingly.  These  thresholds  affeet  the  visit  rates  of  plaees.  InN3, 

<going  to  work  threshold>  <influences  positively>  <work  presenee  rate>  . 

<going  to  work  threshold>  <influenoes  negatively>  <pharmaoy  visit  rate>  . 
<pharmaoy  visit  threshold>  <influenoes  positively>  <pharmaoy  visit  rate>  . 
<pharmaoy  visit  threshold>  <influenoes  negatively>  <dootor  visit  rate>  . 

<dootor  visit  threshold>  <influences  positively>  <dootor  visit  rate>  . 

<dootor  visit  threshold>  <influenoes  negatively>  <emergenoy  room  visit  rate>  . 

The  verbs  denote  the  relationships  between  the  behavioral  thresholds  of  going  to 
plaees  with  the  plaees’  visit  rates.  An  agent’s  deeision  of  going  to  work,  pharmaey,  or 
doetor  is  based  on  the  thresholds  and  on  the  agent’s  health  status,  that  is,  whether  an 
agent  is  siek,  what  disease(s)  an  agent  has,  and  how  severe  the  symptoms  an  agent  has. 

The  domain  knowledge  spaee  has  the  same  format  as  the  simulation  knowledge 
spaee.  In  the  simple  example  above,  the  domain  knowledge  spaee  is  the  same  as  the 
simulation  knowledge  spaee.  In  general,  however,  the  domain  knowledge  space  is  larger 
than  the  simulation  knowledge  space. 

In  the  example,  the  alerts  are  in  the  form  of  value-too-high  and  value-too-low  for 
bound  checks  of  the  simulation  output  value  against  empirical  minimum  and  maximum 
values.  WIZER  compares  the  annual  mean  value  of  numerous  data  streams  against  the 
empirical  data  and  gives  out  the  alerts.  As  noted,  4  simulation  runs  of  Hampton  city  with 
100%  scale  are  done.  In  this  case,  the  output  data  is  averaged  over  the  4  simulation  runs, 
before  the  Alert  does  its  symbolic  categorization. 

•  ER  registration  is  above  the  empirical  bound  of  0.232  visits  per  person  per 
year 
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edregistration-yearly-avg,  2.24856  (simulation  yearly  average), 
0.056  (empirieal  min),  0.232  (empirieal  max),  above  the 
bounds 

•  Doctor  visit  is  above  the  empirical  bound  of  1 .61 1  visits  per  person  per 
year 

insuranceclaim-yearly-avg,  3.16572,0.415,1.61 1, above  the  bounds 

•  School  absenteeism  is  within  the  empirical  bound  of  absence  rate 

school-yearly-avg,  3.62525,3.04,5. 18,within  bounds 


Of  the  four  simulations  in  the  trial,  all  give  consistent  results  of  ER  registration  being 
above  the  empirical  bound,  doctor  visit  being  above  the  empirical  bound,  and  school 
absenteeism  being  within  the  empirical  bounds. 

An  example  of  the  operation  of  the  Inference  Engine  is  as  follows. 

•  ER  visit  rate  is  too  high  so  increase  the  behavior  threshold  for  going  to  ER 
(by  a  constant  amount) 

•  Doctor  visit  rate  is  too  high  so  increase  the  behavior  threshold  for  going  to 
doctor  office 

•  School  visit  rate  is  within  bounds  so  leave  alone  the  behavior  threshold 
related  to  school  going  behavior 

•  3  data  streams  with  1  data  stream  within  bounds:  33%  validity 


As  the  domain  knowledge  space  and  the  domain  knowledge  space  are  assumed  to 
be  the  same,  it  is  the  degree  of  match  between  simulation  outputs  and  empirical  data  that 
is  counted  toward  the  (total)  validation  level.  Eor  the  first  iteration  of  the  simulation,  we 
have 

•  3  data  streams  with  1  data  stream  within  bounds  -^33%  validity 
Eor  the  second  iteration,  we  have 

•  3  data  streams  with  2  data  streams  within  bounds  67%  validity 
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4.14  Comparisons  of  WIZER  to  Other  Tools 


Few  multi-agent  simulations  have  exploited  the  depth  and  breadth  of  available 
knowledge  and  information  for  validation  that  resides  in  journals,  textbooks,  websites, 
human  experts,  and  other  sources.  Typically,  simulation  results  are  designed  solely  for 
human  analysis  and  validation  is  provided  by  subject  matter  experts  employing  the  labor- 
intensive  and  tedious  VV&A  process. 

WIZER  is  unique  in  that  it  utilizes  ontological  and  knowledge-based  inference  for 
validation  and  model-improvement.  It  strives  to  use  as  much  deep  and  profound 
knowledge  as  possible  by  making  use  of  works  in  description  logics  and  ontological 
reasoning.  WIZER  seeks  to  emulate  scientists  doing  experiments  and  analyses  via  the 
scientific  method,  instead  of  simply  providing  a  programming  environment. 

While  other  toolkits  such  as  Swarm  (http://wiki.swarm.org').  TAEMS  (O’Hare  and 
Jennings  1995,  Lesser  et  al.  2004),  and  Repast  (http://repast.sourceforge.net)  are  designed 
with  the  goal  of  assisting  the  design  and  implementation  of  agent-based  systems,  WIZER 
is  designed  to  help  with  scientific  experimentation,  validation,  analysis,  and  model 
improvement.  WIZER  is  conceptually  able  to  run  on  top  of  any  simulation  system, 
including  those  constructed  using  Swarm  and  Repast  toolkits  provided  that  corresponding 
knowledge  bases  are  provided.  WIZER  is  basically  a  causal  and  logical  reasoning, 
experimentation,  and  simulation  control  engine  with  statistical  and  pattern  recognition 
capabilities.  This  is  similar  to  techniques  scientists  employ  when  forming  hypotheses  and 
designing,  executing,  and  analyzing  experiments  for  hypothesis  testing. 

WIZER  differs  from  evolutionary  programming  (Eogel  1999),  evolutionary 
strategies,  and  genetic  algorithms.  WIZER  does  not  need  a  population  of 
mutation/crossover  candidates  nor  does  it  need  the  mutation,  crossover,  and  other 
evolutionary  and  genetic  constructs.  Instead,  WIZER  applies  knowledge  inference  to 
simulations  to  design  the  next  simulation  run,  based  on  scientific  experimental  method.  If 
the  result  of  inferences  mandates  a  radical  change,  a  revolution  will  occur. 
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The  following  table  shows  the  eomparison  between  WIZER  and  other  tools. 


Table  3.  Features  Comparison 


WIZER 

Swarm/TAEMS/Repast 

Evolutionary 

Strategies 

Data 

Farming 

Programming 

environment? 

No 

Yes 

No 

No 

Unit  of  inference 

Rule  and 
eausation 

None 

Evolutionary 
and  genetic 
operators 

Data 

growing 

heuristics 

Object  of 

operation 

Simulation, 

data, 

knowledge 

Code 

Simulation 
and  data 

Data 

Experimentation? 

Yes, 

automated 

Yes,  human  operated 

Yes, 

automated 

(fitness) 

No 

Automated 

simulation 

control? 

Yes 

No 

Yes 

No 

Knowledge 

operation? 

Yes 

No 

No 

No 

99 


4.15  Conclusion 


WIZER  is  a  knowledge-based  and  ontological  tool  for  validation  and  model- 
improvement  of  simulation  systems.  It  is  capable  of  emulating  the  basic  inferences  that 
human  experimenters  perform  to  validate  and  improve  simulation  models.  It  can  reduce 
the  number  of  search  needed  to  validate  simulation  models  as  it  makes  use  of  knowledge 
space  search  in  addition  to  parameter  space  search.  WIZER  is  powered  by  knowledge 
inference,  so  it  is  as  powerful  as  the  knowledge  and  the  inference  mechanisms  contained 
in  it.  WIZER  is  unique  as  it  focuses  on  the  analysis,  inference,  and  control  of  simulations 
instead  of  providing  a  verification  and  programming  environment. 

WIZER  is  limited  by  the  knowledge  inside  its  system  and  its  reasoning 
mechanisms.  If  the  majority  of  the  knowledge  is  wrong,  WIZER  will  output  wrong 
inferences  and  wrong  validations.  An  anchor  to  the  empirical  data  may  mitigate  this,  but 
how  to  change  existing  knowledge  based  on  data  remains  a  research  question.  Hypothesis 
building  and  testing  using  simulation  proxies  may  be  one  answer,  as  this  dissertation 
indicates.  The  reasoning  mechanisms  in  WIZER  currently  consist  of  causal  and  IE-THEN 
forward-chaining  mechanisms  and  the  ontological/semantic  reasoning.  WIZER  does  not 
incorporate  a  learning  mechanism,  except  for  the  simple  hypothesis  building  using  search 
in  the  ontologies  and  knowledge  bases  and  for  virtual  experiments  performed  to  test  the 
hypothesis  which  may  result  in  the  acquisition  of  new  facts  and  relations. 
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CHAPTER  V:  Evaluation  Criteria 


As  a  knowledge-based  tool  for  validation  and  model-improvement,  evaluation  in  WIZER 
derives  in  part  from  employing  a  knowledge-based  system  in  the  evaluation.  Any 
knowledge-based  system  depends  to  a  large  extent  on  its  knowledge  bases,  in  addition  to 
its  inference  mechanisms. 

In  this  chapter,  I  present  simple  evaluation  criteria  for  validation:  value 
comparison,  curve  and  pattern  comparison,  statistical  comparison,  and  conceptual 
comparison.  This  is  similar  to  statistically  comparing  two  sample  distributions  (Box, 
Hunter,  and  Hunter  2005).  Additionally,  performance  (defined  as  how  quickly  and 
effectively  validation  is  completed)  can  be  measured  by  comparing  the  search  space  in 
parameter,  knowledge,  and  meta-model  spaces  before  and  after  knowledge  and 
simulation  inference. 

For  validation,  the  outputs  and  occurrences  of  the  simulation  are  converted  into 
symbolic  knowledge  by  using  mathematical  routines  such  as  a  bound  checking,  which 
examines  how  much  the  simulation  outputs  fit  the  empirical  bounds.  This  is  one 
evaluation  criterion  for  validation.  Another,  more  profound,  criterion  is  whether  the 
behaviors  of  the  simulation  model  itself  fit  the  empirical  knowledge.  This  is  measured  by 
comparing  model  knowledge  bases  and  links  with  the  empirical  ones. 

For  performance  evaluation,  a  set  of  measures  gauges  the  effects  of  knowledge, 
simulation,  and  inference  on  the  amount  and  the  focus  of  search  in  the  parameter,  meta¬ 
model,  and  knowledge  spaces.  This  includes  the  comparison  between  knowledge-less  and 
knowledge-based  parameter  search  space. 

For  model-improvement  evaluation,  I  describe  a  simple  semantics-based 
comparison  of  validity  before  and  after  an  attempt  to  improve  the  simulation  model  by 
the  model-improvement  module. 

As  WIZER  is  a  knowledge-based  system,  the  ontology  (Gomez-Perez  et  al.  2004) 
and  the  need  to  include  the  knowledge  in  the  inference  engine  facilitate  a  clear  and 
succinct  representation  of  subject  matter  expert’s  and  policy  analyst’s  knowledge  and 
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judgment.  Due  to  its  emphasis  on  preeision  and  transparency,  the  process  and  the 
evaluation  of  validation  and  model-improvement  are  facilitated. 


5.1  Statistical  Sample  Comparison 


Statistics  can  compare  samples  parametrically  and  non-parametrically.  To  use  parametric 
methods,  samples  must  have  a  normal  distribution  and  be  independent.  The  sample  size 
must  be  large  enough,  usually  more  than  30.  Absent  an  assumption  of  normal 
distribution,  non-parametric  methods  must  be  used. 

Parametric  methods  have  the  advantage  of  being  easy  to  use  and  understand.  They 
make  it  easy  to  quantitatively  describe  the  population  or  the  actual  difference  between 
populations.  The  methods  employ  established  statistical  distributions  (e.g.,  normal, 
Poisson,  and  Gamma  distributions).  The  disadvantage  of  parametric  methods  is  that  they 
require  the  assumption  of  the  underlying  statistical  distribution  for  the  sample.  A  skewed 
distribution  cannot  be  assumed  away. 

The  advantages  of  non-parametric  methods  include: 

1.  They  provide  an  aura  of  objectivity  when  there  is  no  reliable  underlying, 
universally  recognized,  scale  for  the  original  data  and  there  is  some  concern  that 
the  results  of  standard  parametric  techniques  would  be  criticized  for  their 
dependence  on  an  artificial  metric. 

2.  Non-parametric  tests  make  less  stringent  demands  of  the  data.  It  is  not  required, 
for  example,  that  normality  or  equal  standard  deviation  applies. 

3.  Non-parametric  test  can  be  used  to  get  a  quick  answer  with  little  calculation. 

4.  They  can  be  employed  when  the  data  do  not  constitute  a  random  sample  from  a 
larger  population  and  standard  parametric  techniques  based  on  sampling  from 
larger  populations  are  no  longer  appropriate. 

The  disadvantages  of  non-parametric  methods  include: 

1 .  They  still  require  random  samples. 
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2.  As  they  contain  no  parameters,  it  is  difficult  to  make  quantitative  statements 
about  the  actual  difference  between  populations. 

3.  They  throw  away  information,  for  example,  the  sign  test  only  uses  the  signs  of 
the  observations. 

Parametric  tests  for  comparing  samples  include  t  test  to  compare  two  independent 
samples.  The  equivalent  non-parametric  one  is  the  Wilcoxon  rank-sum  test. 

A  simulation  usually  uses  hypothesized  families  of  distributions  for  its  stochastic 
variables,  estimating  the  statistical  parameters,  and  determining  how  representative  the 
fitted  distributions  are.  The  degree  of  fitness  of  a  distribution  against  the  data  is 
determined  by  heuristic  procedures  and  goodness-of-fit  tests  (Law  and  Kelton  2000). 
Heuristic  procedures  include  density/histogram  overplots  and  frequency  comparisons, 
distribution  function  differences  plots,  and  probability  plots.  Goodness-of-fit  tests  include 
chi-square  tests,  Kolmogorov-Smirnov  tests,  Anderson-Darling  tests,  and  Poisson- 
process  tests. 


5.2  Validation  Evaluation  Criteria 


The  validation  evaluation  is  based  on  a  set  of  statistical  tests  of  simulation  events/outputs 
against  empirical  data.  It  consists  of  value  comparison,  curve  comparison,  pattern 
comparison,  statistical  comparison,  and  conceptual  comparison. 


5.2.1  Value  Comparison 

Value  comparison  simply  compares  the  value  of  a  simulation  output  stream  against  an 
empirical  value  (or  a  pair  of  empirical  values  in  the  form  of  the  minimum  and  maximum 
bounds).  One  simulation  trial  suffices  for  some  cases.  However,  given  multiple 
simulation  trials,  the  mean  and  standard  deviation  of  the  simulation  output  streams  are 
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calculated  and  compared  against  the  empirieal  values  of  mean  and,  if  available,  standard 
deviation. 

In  data  stream  comparison,  the  semanties  of  the  data  eannot  be  neglected.  For 
example,  annual  school  absenteeism  has  the  semanties  of  eounting  absenteeism  when 
sehool  is  in  session.  This  means  summer  vaeation,  holidays,  Saturdays,  and  Sundays  are 
not  eounted  and  have  no  meaning  of  absenteeism. 

If  the  simulation  and  empirieal  values  eompare  100%  with  eaeh  other,  then  the 
validity  is  100%  for  the  data  stream.  The  semanties  is  noted  for  this  data  stream,  using  N3 
notation: 

<simulated  data  stream  S>  <is  100%  validated  with>  <empirieal  data  E>  . 

When  the  values  eompare  with  n  pereent  probability,  it  is  noted  as 

<simulated  data  stream  S>  <is  n  pereent  probability  validated  with> 

<empirieal  data  E>  . 

When  the  mean  and  standard  deviation  of  simulation  output  data  are  available  -  assuming 
a  normal  distribution,  and  that  the  empirieal  value  as  the  value  V  is  available  to  eompare 
against,  a  parametrie  eonfidenee  interval  ean  be  eomputed  and  the  probability  ean  be 
assessed.  InN3,  the  semanties  is  noted  as 
<simulated  data  streams  S> 

<is  validated  with  n  pereent  probability  using  95%  eonfidenee  mterval> 
<empirieal  value  V>  . 

If  the  simulation  output  data  has  to  be  assumed  to  be  non-parametrie,  then  the  non- 
parametrie  eonfidenee  interval  is  eomputed  for  the  median.  A  non-parametrie  signifieant 
test  ean  also  be  eomputed. 


5.2.2  Curve  Comparison 

While  value  eomparison  is  simple  and  useful,  in  some  eases  curves  need  to  be  eompared. 
There  are  two  ways  to  eompare  eurves:  semantics-based  and  mathematieal.  The 
mathematieal  approach  assumes  differentiable  eurves.  It  employs  the  methods  of 
eurvature  matching,  tangent  angles,  and  template  matehing.  As  the  eurves  for  our  purpose 
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have  meaningful  parameterization  (e.g.,  have  a  time  axis),  whieh  is  to  say,  they  are  not 
purely  geometrie,  the  semanties  of  the  eurves  helps  in  the  eomparison.  As  a  result,  a  mix 
of  mathematieal  and  semantie-based  methods  ean  be  used. 

The  eurves  are  eompared  by  the  following  methods: 

1 .  Magnitude  eomparison:  whether  a  eurve  value  is  higher  than  a  referenee  value 
or  than  that  of  the  other  eurve  within  a  eertain  interval. 

2.  Trend/gradient  eomparison:  whether  and  how  fast  a  eurve  inereases  or 
deereases. 

3.  Peak  eomparison:  whether  a  eurve  peak  is  similar  to  that  of  another  eurve 

4.  Curve  shape  eomparison.  The  tangents,  eurvatures,  and  semanties  are  matehed 
and  eompared.  (This  is  harder  that  the  above.) 

The  result  of  eurve  eomparison  are  validation  values  that  say  how  valid  a  eurve  is 
eompared  with  a  referenee  value  or  curve.  For  example,  in  N3  the  influenza  peak 
comparison  can  be  noted  as: 

<simulated  influenza  peak>  <is  similar  to,  with  95%  validity>  <empirical 
influenza  peak> 


5.2.3  Pattern  Comparison 

Patterns  are  more  complex  than  curves.  While  curves  are  more  or  less  continuous, 
patterns  can  change  abruptly  and  are  sometimes  intermittent.  As  patterns  for  our  purpose 
have  meaningful  parameterization,  the  semantics  of  the  patterns  is  used  to  help  with 
comparison. 

The  result  of  the  comparison  are  validation  values  which  in  N3  can  be  noted  as 
<simulated  pattern  I>  <is  similar,  with  95%  validity>  <empirical  pattern  B> 
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5.2.4  Statistical  Comparison 


When  the  empirical  data  samples  are  available,  both  the  simulated  data  and  the  empirical 
data  can  be  characterized  statistically.  If  a  normal  distribution  can  be  assumed,  t-test  is 
used  to  compare  the  two  sets  of  data.  (The  data  is  assumed  to  be  independent  samples.)  If 
no  normal  distribution  can  be  assumed,  the  Wilcoxon  rank-sum  test  is  used. 

The  result  of  the  statistical  comparison  are  the  validation  values,  which  in  N3  can 
be  written  as  the  followings: 

<simulated  data  stream>  <has  the  same  population,  with  100%  validity> 
<empirical  data  stream>  . 

<simulated  data  stream>  <uses>  <Wilcoxon  rank-sum  test>  . 


5.2.5  Conceptual  Comparison 

When  a  simulation  has  longitudinal  data  such  as  an  agent’s  history,  the  longitudinal  data 
can  be  compared  conceptually  or  semantically  with  empirical  data.  For  example,  a  typical 
pattern  of  a  K- 12  child  includes  taking  the  school  bus  and  going  to  school  from  Monday 
to  Friday  during  the  school  year.  The  simulation  output  for  the  average  child  behavior 
history  should  conceptually  match  the  pattern.  This  can  be  thought  as  symbolic  patterns, 
as  opposed  to  numeric  patterns.  The  patterns  are  compared  conceptually  or  semantically 
by  comparing  the  semantics  description  of  the  empirical  data  and  the  simulation  output. 
In  a  modified  N3,  for  example,  the  child  activity  pattern  can  be  written  as: 

<a  chronological  pattern>  <consists  of>  <four  sequences>  . 

<a  K-12  child>  <takes>  <a  school  bus>  . 

<a  K-12  child>  <goes  to>  <school>  . 

<a  K-12  child>  <takes>  <a  school  bus>  . 

<a  K-12  child>  <goes  to>  <home>  . 

The  comparison  can  be  record-by-record,  but  smarter  comparison  utilizing  ontological 
reasoning  can  be  employed. 
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The  result  of  the  eomparison  is  the  validation  values  which  give  a  notice  of 
whether  the  symbolic  patterns  match.  In  N3, 

<simulated  child  behavior  history  for  Mondays>  <matches> 

<empirical  child  behavior  history  for  Mondays>  . 


5.3  Performance  Evaluation  Criteria 


The  speed  and  effectiveness  of  a  validation  are  measured  against  the  search  space  and  the 
knowledge  space  -  including  ontology. 


5.3.1  Reduction  of  the  Amount  of  Searching 

The  reduction  of  the  amount  of  searching  is  simply  measured  by: 

1 .  The  size  of  the  search  space  before  the  application  of  WIZER  inference, 

2.  The  size  of  the  search  space  that  need  to  actually  be  searched  when  WIZER  is 
applied. 

The  division  of  (2)  by  (1)  indicates  the  proportion  of  search  reduction  by  WIZER. 


5.3.2  Showing  How  Knowledge  Can  Help  Focus  the  Search 

How  much  knowledge  can  help  focus  the  search  is  measured  by 

1 .  Knowledge  bases  and  inferences, 

2.  The  extent  of  search  space  before  the  application  of  knowledge  inference, 

3.  The  extent  of  search  space  after  knowledge  inference  of  WIZER. 

The  comparison  of  (3)  with  (2)  in  light  of  (1)  produces  the  focus  “quality”  for  a  particular 
knowledge  and  inference  by  WIZER. 
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5.4  Model-Improvement  Evaluation  Criteria 


Whether  the  simulation  model  is  improved  after  the  addition  or  deletion  of  simulation 
links  by  the  model-improvement  module  is  determined  by  eomparing  the  validity  of  the 
simulation  before  and  after  the  addition  of  said  links.  This  eomparison  is  guided  by 
ontology  or  semanties.  In  other  words,  the  many  validation  values  for  various  data 
streams  (whieh  are  deseribed  semantieally)  are  examined  for  their  relative  importanee  by 
the  ontology  or  semantics.  Furthermore,  while  a  single  “total”  validity  value  can  be 
calculated,  utilizing  ontology  and  semantics  to  weigh  and  assess  the  true  significance  of 
various  validation  values  is  a  more  sensible  method.  When  the  model  consists  of  causal 
relations,  the  comparison  -  aided  by  ontology  -  of  models  before  and  after  adjustments 
indicates  comparative  causality. 


5.5  Summary 


This  chapter  describes  knowledge-based  evaluation  criteria  for  validation,  performance, 
and  model-improvement.  It  describes  how  curves  and  other  simulation  outputs  are 
compared.  Knowledge  and  ontological  inference  allows  WIZER  to  prune  and  focus  the 
search  space  for  validation  and  model-improvement. 
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Chapter  VI:  Bio  War  TestBed 


WIZER  is  applied  to  partially  validate  a  multi-agent  soeial-network  model  ealled 
BioWar.  BioWar  (Carley  et  al.  2003)  is  a  model  eapable  of  simulating  the  effeets  of 
weaponized  biologieal  attaeks  on  a  demographieally-realistie  population  with  a 
baekground  of  naturally-oeeurring  diseases.  It  integrates  prineiples  of  epidemiology 
(Anderson  and  May  1991,  Lawson  2001,  Bhopal  2002),  health  seienee  (Cromley  and 
MeLafferty  2002),  geography,  demographies,  soeiology,  soeial  networks  (Wasserman 
and  Faust  1994),  behavioral  seienee,  and  organization  seienee. 

In  this  ehapter,  I  show  how  WIZER  is  used  to  partially  validate  BioWar  in  two 
validation  seenarios.  The  validity  of  the  results  is  deseribed  and  diseussed. 


6.1  Description  of  BioWar 


BioWar  simulates  what  persons  do  in  daily  lives  before  and  after  they  are  infeeted  with 
diseases  in  a  eity.  The  following  figure  partially  shows  the  eausal  relationships  among  the 
simulation  entities  in  BioWar.  Note  that  the  arrow  direetion  means  “may  eause  or 
infiuenee”.  These  relationships  are  put  into  the  knowledge  base  for  the  WIZER  Inferenee 
Engine. 
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Figure  7.  Partial  Causal  Diagram  of  BioWar 


To  facilitate  descriptions  without  the  need  to  draw  graphs  and  for  automation,  however, 
we  could  use  a  simple  syntax  in  the  form  of: 

(setvalue  predicate  value):  set  the  value  of  a  variable/predicate  to  “value”, 
(setstdev  predicate  value):  set  the  standard  deviation  in  the  value  of  the  predicate 
set  previously  by  the  setvalue  operator  to  “value’. 

(setbelief  predicate  value):  set  the  probability  of  the  value  of  the  predicate 
being  correct  to  “value”. 

(setpriority  predicate  value):  set  the  priority  of  the  variable/predicate.  This 

priority  determines  the  order  by  which  rules  and/or  entities  are  examined, 
(setchangeable  predicate  value):  set  the  degree  by  which  a  rule  and/or 
an  entity  can  change  in  value. 

(causes  predicate  predicate):  set  the  causation  relationship  relating 
two  variables/entities. 

(convertible  predicate  predicate):  defining  that  the  values  of  two  variables/entities 
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can  be  transformed  mathematically  to  each  other. 

(if-then  (predicate  value)  (predicate  value)): 

set  the  if-then  relationship  between  two  variables/nodes 
where  predicate  is  a  node/variable/entity,  and  value  is  any  Boolean,  integer,  real, 
qualitative,  or  enumerated  value  where  applicable.  The  special/predetermined  predicates 
include  “causes”,  “if-then”,  and  “convertible”,  where  causes  denotes  causal  relations,  if- 
then  denotes  if-then  relations,  and  convertible  denotes  that  the  values  can  be 
mathematically  converted  to  each  other.  A  prefix  of  “op-“  in  the  predicates  means  the 
predicates  modify  values  in  the  working  memory  of  the  forward-chaining  mechanism. 

The  figure  below  shows  one  proeess  model  related  to  the  causal  model  of  BioWar 
shown  above  in  Figure  7.  This  proeess  model  elueidates  the  eausal  relation  of  agent's 
infeetion  and  agent's  symptom  severity  in  the  causal  diagram.  It  is  applied  to  one 
individual,  instead  of  a  population  sample.  It  is  general  enough  to  capture  the  processes 
of  most  infectious  diseases. 


Susceptible 

Infected 

Communicable 

{ _ 

r 

Treated  or 
Dead 

Symptomatic 

Figure  8,  A  Process  Model  for  Infectious  Diseases 
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As  shown,  the  process  starts  in  the  state/phase  of  susceptible,  then  transitions  to  infected 
state,  communicable/contagious  (a  period  when  a  person  can  infect  others)  state,  and 
symptomatic  state.  These  phases  exit  to  the  state  of  either  treated  or  dead.  Note  that  while 
the  rectangles  seem  to  suggest  the  states  are  distinct,  here  they  do  not  mean  so.  (Thus  the 
process  diagram  is  not  the  same  as  the  finite  state  machine,  as  in  finite  state  machine 
states  do  not  overlap.)  The  infected  state  or  phase  brackets  both  the  communicable  and 
symptomatic  phases.  The  communicable  and  symptomatic  phases  can  overlap.  As  an 
example,  for  influenza,  an  adult  person  can  be  in  the  communicable  phase  1  day  before 
being  in  the  symptomatic  phase  and  continue  to  be  in  the  communicable  phase  until  5 
days  after  the  initial  infection.  Treatment  for  influenza  only  minimizes  the  symptoms,  and 
does  not  cure  the  disease.  For  smallpox,  the  incubation  period  last  7  to  17  days  during 
which  a  person  is  not  communicable/contagious,  followed  by  the  symptomatic  and 
communicable  phases  at  the  same  time.  This  symptomatic  phase  is  further  divided  into 
initial  symptoms  (prodome)  which  lasts  2-4  days,  early  rash  for  about  4  days,  pustular 
rash  for  about  5  days,  pustules  and  scabs  for  about  5  days,  resolving  scabs  for  about  6 
days,  and  then  finally  resolved  scabs.  (The  subphases  are  modeled  in  a  manner  similar  to 
the  process  model  for  phases.)  The  early  rash  is  the  most  contagious  subphase,  while 
other  subphases  are  contagious  except  for  the  resolved  scabs  phase.  All  this  symbolic  and 
semantic  information  is  critically  important  to  the  process  model.  Augmenting  the 
process  model  with  symbolic  and  semantic  information  produces  the  process  logic. 
Process  logic  is  defined  as  the  sequenced  or  ordered  events  based  on  the  process  model 
augmented  by  semantic  information  and  ontology. 
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6.2  The  Need  for  Automated  Validation 


BioWar  has  a  large  number  of  model  variables.  The  interaetions  between  variables 
influenee  the  outeome. 

The  Spiral  Development  model  (Boehm  2000)  of  BioWar  means  that  the  previous 
validation  of  model  predietions  may  no  longer  hold.  Furthermore,  the  assumptions  behind 
large  seale  multi-agent  models  are  eomplex,  often  not  stated,  and  not  operable  (not 
suitable  for  eomputer  proeessing  and  automation).  Changing  parameter  values  without 
understanding  the  assumptions  and  reasoning  behind  them  can  lead  to  unforeseen  results. 

To  address  the  above  problems,  automated  validation  with  the  assumption 
tracking  is  needed.  So  here  WIZER  plays  a  vital  role,  by  explicitly  stating  the 
assumptions  and  reasoning  behind  changes  in  parameter  and  model  values  in  relation  to 
the  validation  process. 
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6.3  WIZER  as  Applied  to  BioWar 


WIZER  takes  the  BioWar  outputs  and  oeeurrenees  as  input  (e.g.,  sehool  absenteeism, 
work  absenteeism,  doetor  visits,  emergency  room  visits,  pharmacy  visits,  drug  purchases, 
and  agent  health  statistics)  along  with  the  context  of  the  simulation  (e.g.,  city  layout, 
demographics  distribution,  school  district  boundaries,  calendar  of  events,  and  agent 
occupations)  and  the  corresponding  empirical  data.  After  comparing  the  simulation 
outputs  with  the  empirical  data,  WIZER  performs  inferences  to  decide  what  parameters 
need  to  change  so  that  the  next  simulation  would  be  the  most  reasonable  search  step 
toward  increased  validity.  As  more  diverse  types  of  empirical  data  are  compared,  the 
chance  of  different  parameter  values  fitting  all  empirical  data  gets  smaller. 


6.4  Data  Sources  for  Validation 


Getting  access  to  data  for  validating  BioWar  is  non-trivial.  In  this  dissertation,  limited 

data  are  used.  The  following  are  the  data  streams  that  can  be  used  by  WIZER.  Note  that 

both  the  input  and  output  data  sources  for  BioWar  are  listed,  because  both  can  be  used  in 

WIZER.  Note  that  not  all  validation  scenarios  require  all  the  data  sources. 

o  Hospital  and  park  locations  from  GNIS  database,  http://geonames.usgs.gov 
o  Demographics  from  Census  Bureau’s  Summary  Eile  1, 
http  ://factfinder.  census .  gov/home/ en/  sf  1  .html 
o  Work,  medical,  recreation  location  counts  from  Census  Bureau’s  Economic 
Census,  http://www.census.gov/econ/www/econ  cen.html 
o  Cartographic  boundaries  from  Census  Bureau, 
http://www.census.gov/geo/www/cob 
o  School  demographics  and  locations  from  NCES’  CCD  database, 
http://nces.ed.gov/ccd 

o  Student  absenteeism  statistics,  http://nces.ed.gov/pubsearch 
o  Social  network  characteristics  from  GSS, 

http://www.icpsr.umich.edu:8080/GSS/homepage.htm 

o  Climate  and  wind  data  from  NCDC  at  NOAA, 
http://www.ncdc.noaa.gov/oa/ncdc.html 
o  Disease  symptoms  and  diagnosis  model  from  Internist  I  (Miller  et  al.  1982) 
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o  Medical  visit,  mortality  and  morbidity  statistics  from  CDC’s  NCHS  surveys, 
http://www.cdc.gov/nchs 
o  Disease  timing  and  symptoms  from  CDC, 
http://www.cdc.gov/publications.htm 

o  CDC  weekly  report  for  influenza 

o  Influenza  data  from  http://www.wrongdiagnosis.eom/f/flu/stats.htm  and 
NIAID,  the  National  Institute  of  Allergy  and  Infectious  Disease 
o  SDI  (Surveillance  Data  Inc)  emergency  room  registration  data 
o  DARPA  SDI  hospital  and  clinics  visit  data  for  five  cities 
o  DARPA  ADS  hospital  and  clinics  visit  data  for  five  cities 
o  DARPA  PDTS  hospital  and  clinics  visit  data  for  five  cities 


6.5  Validation  Scenarios 


I  examine  two  validation  scenarios.  Validation  Scenario  I  examines  the  influenza  effects 
of  incidence  (and  thus  prevalence  and  death  rate)  in  relation  to  several  input  parameters 
such  as  ailment  exchange  proximity  threshold.  Validation  Scenario  II  examines  the 
relative  timing  of  the  peaks  of  the  children  absenteeism  curve,  the  over-the-counter  drug 
purchase  curve,  and  the  incidence  curve.  Empirical  data  is  gathered  from 
http ://www.wrongdiagnosis .com/ f/ flu/ stats .htm  and  the  National  Institute  of  Allergy  and 
Infectious  Disease  (NIAID). 


6.5.1  Validation  Scenario  I:  Incidence  Factors 

This  scenario  examines  the  simulated  incidence  compared  to  the  empirical  observed 
incidence  for  influenza  in  relation  to  the  input  parameters  of  initial  rate  of  spread,  ailment 
effective  radius,  and  ailment  exchange  proximity  threshold. 

Prevalence  and  incidence  are  measures  of  a  disease's  occurrence.  The  prevalence 
of  a  condition  denotes  the  number  of  people  who  currently  have  the  condition,  while  the 
incidence  refers  to  the  annual  number  of  people  who  have  a  case  of  the  condition.  A 
cumulative  sum  of  incidence  yields  prevalence,  on  the  condition  that  other  factors  are 
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assumed  to  have  no  effeets.  Influenza  has  a  high  ineidenee  but  low  prevalenee,  while 
diabetes  -  a  ehronie  ineurable  disease  -  has  a  high  prevalenee  but  low  ineidenee. 

In  the  simulation,  we  ean  have  the  God’s  eye-view  of  ineidenee  and  prevalenee, 
so  we  ean  have  the  aetual  numbers  of  ineidenee  and  prevalenee,  thus  aetual  ineidenee  and 
aetual  prevalenee  ean  be  known.  In  the  real  world,  only  observed  ineidenee  and 
prevalenee  are  known  (in  simulation  these  are  mirrored  by  the  simulated  observed 
ineidenee  and  observed  prevalenee  variables). 

The  variables  and  output  values  for  this  seenario  are  as  follows. 

(1)  Outputs  for  empirieal  matehing:  I  ehoose  the  simulated  aetual 
ineidenee  to  mateh  with  the  empirieal  data,  given  that  other  measures 
are  more  or  less  the  same  for  the  purpose  of  inferenee.  Whieh  is  to  say, 
I  ean  use  the  simulated  observed  ineidenee,  but  this  adds  another 
faetor  (the  rate  by  whieh  ineidenee  is  “observed”  in  simulation)  whieh 
is  non-eritieal.  I  eould  also  use  prevalenee,  but  this  also  adds  more 
faetors  sueh  as  disease  reeovery  rate.  As  the  empirieal  data  I  have 
from  NIAID,  the  National  Institute  of  Allergy  and  Infeetious  Disease, 
is  the  observed  ineidenee  data,  I  do  not  eompare  it  with  the  simulated 
observed  ineidenee  data  to  keep  things  simple  (there  is  no  duplieate 
“observation”).  Instead,  I  eompare  it  with  the  simulated  aetual 
ineidenee  data. 

(2)  Variables:  base-rate  (the  rate  of  infections  among  susceptible  persons 
exposed  by  a  disease  release),  ailment  effective  radius  (the  radius  from 
the  center  of  disease  agent  release  that  persons  can  get  infected 
initially),  and  ailment  exchange  proximity  threshold  (the  distance  over 
which  the  probability  of  ailment  transmission  decreases  significantly). 

The  knowledge  base  is  as  follows. 

The  causal  conceptual  diagram: 

(causes  ailment-effective-radius  infection-rate) 

(causes  ailment-exchange -proximity-threshold  infection-rate) 

(causes  base-rate  infection-rate) 


116 


(convertible  infection-rate  aetual-ineidenee) 

Note  that  the  optional  meehanisms/proeesses  underlying  eausal  relations  are  not  used  in 
this  seenario.  The  mechanisms  can  be  represented  as  a  table,  a  function,  or  a  pseudocode. 

The  rules  related  to  the  eausal  relations  are  as  follows.  The  “op-”  prefix  in  a 
predieate  denotes  the  operand  predieate  whieh  ehanges  the  value  of  the  variable. 

(if-then  (toolow  aetual-ineidenee)  (op-higher  ailment-effeetive-radius)) 

(if-then  (toohigh  aetual-ineidenee)  (op-lower  ailment-effeetive-radius)) 

(if-then  (toolow  aetual-ineidenee) 

(op-higher  ailment-exchange-proximity-threshold)) 

(if-then  (toohigh  aetual-ineidenee) 

(op-lower  ailment-exehange-proximity-threshold)) 

(if-then  (toolow  aetual-ineidenee)  (op-higher  base-rate)) 

(if-then  (toohigh  aetual-ineidenee)  (op-lower  base-rate)) 

The  simulation  instantiations  of  variables  are  as  follows. 

(setvalue  base-rate  0.2) 

(setbelief  base-rate  0.5) 

(setpriority  base-rate  3) 

(setvalue  ailment-effeetive-radius  1000) 

(setbelief  ailment-effeetive-radius  0.1) 

(setpriority  ailment-effeetive-radius  1) 

(setvalue  ailment-exehange-proximity-threshold  1000) 

(setbelief  ailment-exehange-proximity-threshold  0.2) 

(setpriority  ailment-exehange-proximity-threshold  3) 

The  simulation  instantiations  of  outputs  are  as  follows.  The  BioWar  simulator  is  run  for 
10  trials  for  100%  seale  Hampton  city.  Part  of  the  Alert  WIZER  module  eomputes  the 
statistical  descriptions  of  simulated  aetual-ineidenee  from  the  10  simulation  trials.  It  gives 
out  the  mean  of  0.0821  and  the  standard  deviation  of  0.0015  for  simulated  aetual- 
ineidenee. 

(setvalue  aetual-ineidenee  0.0821) 

(setstdev  aetual-ineidenee  0.0015) 

(setbelief  aetual-ineidenee  1.0) 
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The  empirical  data  is  as  follows: 

(setvalue  emp-observed-incidence-lowval  0.10) 

(setvalue  emp-observed-incidence-highval  0.20) 

This  scenario  is  based  on  the  comparison  of  simulated  actual-incidence  of 
influenza  with  the  empirical  data  from  NIAID.  The  empirical  data:  10%  (the  lower 
bound)  to  20%  (the  higher  bound)  of  people  have  flu  incidence  yearly.  The  simulated 
average  actual  incidence  of  10  runs  of  100%  Hampton  (population  142,561  persons)  is 
8.21%  of  people  have  flu  incidence  yearly. 

The  Alert  WIZER  module  compares  the  simulation  instantiation  of  the  output 
actual-incidence  with  the  empirical  observed  incidence.  The  Inference  Engine  performs 
rule  inferences  based  on  the  symbolic  results  of  the  comparison.  After  conflict  resolutions 
based  on  the  priority  value  (here  other  weighting  factors  are  not  considered),  it  gives  the 
inference  of: 

(toolow  actual-incidence) 

(op-higher  ailment-effective-radius) 

The  inference  is  that  the  ailment  effective  radius  should  be  increased.  How  much  the 
increase  should  be  is  determined  by  domain  knowledge,  ontology,  and  experiment 
design.  Absent  these,  the  value  is  simply  determined  by  a  simple  divide-and-conquer 
mathematical  routine,  assuming  that  the  parameter  is  more  or  less  monotonic.  If  it  is  not 
monotonic,  the  routine  degenerates  to  random  search  like  the  Monte  Carlo  method. 

In  the  next  simulation  cycle,  the  ailment  effective  radius  is  increased  from  1000 
meter  to  1500  meter,  based  on  a  rough  estimate  of  the  extent  of  the  area  in  which  the 
ailment  would  affect  people,  as  encoded  in  the  knowledge  base  for  the  value/link 
adjustment  routine.  This  represents  a  change  of  50%.  BioWar  is  re-run  for  10  trials  for 
the  same  100%-scale  Hampton  city  and  then  WIZER  is  re-run,  and  the  results  are: 
(setvalue  actual-incidence  0.1482) 

(setstdev  actual-incidence  0.0156) 

The  empirical  data  is  again  as  follows: 

(setvalue  emp-observed-incidence-lowval  0.10) 

(setvalue  emp-observed-incidence-highval  0.20) 
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The  Inference  Engine  responds  with  the  notice: 

(op-valid  actual-incidence) 

The  above  means  that  Bio  War  is  now  generating  simulated  incidence  levels  that  are 
within  the  empirical  observed  incidence  bounds.  This  indicates  WIZER  can  be  used  to 
increase  a  model's  validity,  such  as  BioWar's  validity  based  on  its  inferences. 

The  following  table  summarizes  the  simulated  incidence  rate  before  and  after 
parameter  value  change  as  compared  to  the  empirical  bounds  of  observed  incidence  rate. 


Table  4,  Simulated  1 

Incidence  Rate  before  and  after  Change 

Empirical 

lower 

bound 

Empirical 

higher 

bound 

Simulated  rate 
before  change 

Simulated  rate 

after  change 

Incidence  rate 

0.10 

0.20 

0.08 

0.15 

As  shown  in  the  table,  the  simulated  incidence  rate  is  moved  to  within  the  empirical 
bounds  by  WIZER. 
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6.5.2  Validation  Scenario  II:  Absenteeism  and  Drug  Purchase  Curves 


This  scenario  examines  the  relative  timing  of  peaks  of  the  children  absenteeism  and  the 
drug  purchase  curves  against  the  peak  of  the  incidenee  curve. 

The  variables  and  output  values  for  this  scenario  are  as  follows. 

(1)  Outputs  for  empirieal  matehing;  I  ehoose  the  simulated  actual  incidence, 
the  sehool  absenteeism,  and  the  influenza  drug  purchase  curves.  The  Alert 
WIZER  finds  the  peaks  of  the  curves  and  computes  the  time-differenees 
between  the  peaks. 

(2)  Variables:  as  the  onset  of  absenteeism  is  influenced  by  symptom  onset  and 
symptom  severity,  these  two  faetors  form  the  variables.  In  addition  to 
being  influenced  by  the  two  factors,  the  onset  of  influenza  drug  purchase 
is  influenced  by  the  going-to-pharmaey  behavioral  threshold.  Thus,  the 
total  variables  for  this  scenario  (with  some  simplifieations)  are  symptom- 
onset,  symptom-severity,  and  going-to-pharmacy-threshold. 


The  knowledge  base  is  as  follows. 

The  eausal  conceptual  diagram: 

(causes  symptom-onset  absenteeism-onset) 

(causes  symptom-severity  absenteeism-onset) 

(eauses  symptom-onset  drug-purchase-onset) 

(causes  symptom-severity  drug-purchase-onset) 

(causes  going-to-pharmacy-threshold  drug-purchase-onset) 

(convertible  infeetion-rate  incidence-rate) 

The  onsets  are  computed  against  the  time  of  infection.  Note  that  the  optional  meehanisms 
underlying  causal  relations  are  not  used  in  this  scenario.  The  mechanisms  can  be 
represented  as  a  table,  a  funetion,  or  a  pseudocode. 

The  rules  related  to  the  causal  relations  are  as  follows.  The  “op”  prefix  denotes 
the  operand  predicate  which  changes  the  value  of  the  variable. 

(if-then  (toosoon  absenteeism-onset)  (op-lengthen  symptom-onset)) 

(if-then  (toolate  absenteeism-onset)  (op-shorten  symptom-onset)) 
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(if-then  (toosoon  absenteeism-onset)  (op-lower  symptom-severity)) 

(if-then  (toolate  absenteeism-onset)  (op-higher  symptom-severity)) 

(if-then  (toosoon  drug-purchase-onset)  (op-lengthen  symptom-onset)) 

(if-then  (toolate  drug-purchase-onset)  (op-shorten  symptom-onset)) 

(if-then  (toosoon  drug-purchase-onset)  (op-lower  symptom-severity)) 

(if-then  (toolate  drug-purchase-onset)  (op-higher  symptom-severity)) 

(if-then  (toosoon  drug-purchase-onset)  (op-higher  going-to-pharmacy-threshold)) 
(if-then  (toolate  drug-purchase-onset)  (op-lower  going-to-pharmacy-threshold)) 

(if-then  (tooshort  absenteeism-vs-actual-incidence) 

(op-toosoon  absenteeism-onset)) 

(if-then  (toolong  absenteeism-vs-actual-incidence) 

(op-toolate  absenteeism-onset)) 

The  simulation  instantiations  of  variables  are  as  follows. 

(setvalue  symptom-onset  2) 

(setbelief  symptom-onset  0.5) 

(setpriority  symptom-onset  3) 

(setvalue  symptom-severity  3) 

(setbelief  symptom-severity  0.1) 

(setpriority  symptom-severity  1) 

(setvalue  going-to-pharmacy-threshold  100) 

(setbelief  going-to-pharmacy-threshold  0.2) 

(setpriority  going-to-pharmacy-threshold  2) 
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The  simulation  instantiations  of  outputs  are  as  follows.  One  simulation  of 
Hampton  eity  with  100%  scale  is  run.  The  Alert  WIZER  computes  the  peaks  of  actual- 
incidence,  school  absenteeism,  and  drug  purchase  curves.  It  produces  the  relative  timing 
of  the  peaks  with  respect  to  the  actual-incidence  peak.  The  following  figure  shows  the 
actual-incidence  curve. 


Infifuenza  Incidence 


day 

Figure  9,  The  Peak  of  Incidence  Occurs  on  Day  128 

As  shown,  the  peak  of  incidence  occurs  on  Day  128.  Day  1  is  the  start  of  the  simulation, 
corresponding  to  September  1,  2002.  The  peak  is  computed  by  finding  the  maximum 
point  and  averaging  the  data  points  within  a  fixed  time  interval  around  the  maximum 
point  time.  This  assumes  no  outliers. 


The  Relative  Timing  of  School  Absenteeism  Peak 

In  the  simulation  trial,  the  relative  time  difference  between  simulated  absenteeism  and 
simulated  actual-incidence  peaks  is  10  days. 
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(setvalue  absenteeism-vs-actual-incidence  10) 

(setbelief  absenteeism-vs-actual-incidence  1.0) 

The  CDC  data  says  the  incubation  period  for  influenza  is  1-4  days.  Absenteeism  occurs  a 
day  after  the  end  of  incubation.  Thus,  the  empirical  data  is  as  follows; 

(setvalue  emp-absenteeism-vs-actual-incidence-lowval  2) 

(setvalue  emp-absenteeism-vs-actual-incidence-highval  5) 

The  following  figure  shows  the  school  absenteeism  curve. 


School  Absenteeism 


Figure  10,  The  Peak  of  School  Ahsenteeism  Occurs  on  Day  138 


As  shown,  the  peak  of  school  absenteeism  occurs  on  Day  138.  The  curve  is  broken  on 
Saturdays  and  Sundays  as  the  schools  are  closed.  Days  115-121  are  holidays.  The  peak  is 
computed  by  finding  the  maximum  average  of  weekly  data  and  the  averaging  a  few  data 
points  within  set  time  intervals  around  the  maximum  point.  This  of  course  assumes  that 
there  are  no  outliers. 
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The  inference  engine  compares  the  relative  timing  of  absenteeism  and  incidence 
peaks  with  the  empirical  relative  timing.  After  conflict  resolutions  based  on  the  priority 
value  (here  other  weighting  factors  are  not  considered),  it  produces  the  inference  of; 
(toolong  absenteeism-vs-actual-incidence) 

(op-higher  symptom-severity) 

because  the  absenteeism  peak  lags  10  days  behind  the  incidence  peak;  twice  as  long  as 
the  empirical  maximum  of  5  days. 

The  inference  is  that  the  symptom-severity  (the  relative  timing  and  magnitude  of 
manifested  symptoms)  should  be  increased.  How  much  the  increase  should  be  is 
determined  by  domain  knowledge,  ontology,  and  experiment  design.  Absent  these,  the 
value  is  simply  determined  by  a  simple  divide-and-conquer  algorithm. 

For  the  next  cycle  of  simulation,  the  symptom  severity  is  increased  by  100%  by 
the  value/link  adjustment  routine  using  an  encoded  rule  about  critical  point  heuristics. 
Bio  War  is  re-run  and  then  WIZER  is  re-run.  The  following  figure  shows  the  resulting 
curve  of  school  absenteeism. 


School  Absenteeism  after  Parameter  Value  Change 


day 


Figure  11,  The  Peak  of  School  Ahsenteeism  after  Change  Occurs  on  Day  132 
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As  shown,  the  peak  of  sehool  absenteeism  now  oceurs  on  Day  132.  The  Inferenee  Engine 
eompares  the  relative  timing  of  absenteeism  and  ineidenee  peaks  with  the  maximum 
empirical  relative  timing.  After  conflict  resolutions  are  performed,  it  now  produces  the 
inference  of: 

(within-range  absenteeism-vs-actual-incidence) 

(op-valid) 

The  relative  time  difference  between  absenteeism  and  actual-incidence  peaks  is  now  4 
days,  which  is  less  than  the  previous  cycle's  relative  time  difference  of  10  days.  It  is  now 
one  day  shorter  than  the  maximum  empirical  time  difference.  Thus  the  peak  of  school 
absenteeism  is  moved  to  the  valid  range  within  the  empirical  bound  of  2-5  days.  So  the 
Inference  Engine  produces  a  notice  that  the  simulated  absenteeism  curve  peak  is  now 
valid. 

The  following  figure  shows  the  school  absenteeism  curves  before  and  after 
parameter  value  change. 


-before 
-  after 


Figure  12,  School  Absenteeism  Curves  before  and  after  Parameter  Value  Change 
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As  shown,  the  absenteeism  peak  after  parameter  value  ehange  moves  closer  to  the  time  at 
which  the  incidence  peaks  (as  shown  by  the  black  vertical  line)  than  the  before-change 
absenteeism  peak. 


The  Relative  Timing  of  Drug  Purchase  Peak 


The  next  comparison  of  curve  peaks  is  between  the  drug  purchase  curve  and  the 
incidence  curve.  I  show  first  the  virgin  case,  before  parameter  value  change.  The 
following  figure  shows  the  drug  purchase  curve  for  cold/cough  medication  of  influenza 
before  change. 

Drug  Purchase  at  Pharmacies 
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Figure  13,  The  Peak  of  Drug  Purchase  for  Influenza  Occurs  on  Day  139 
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As  shown,  the  peak  of  drug  purchase  for  influenza  occurs  on  Day  139.  The  peak  occurs 
1 1  days  after  the  incidence  peak. 

The  incubation  period  for  influenza  is  1-4  days,  and  the  illness  typically  resolves 
after  3-7  days  for  the  majority  of  persons  according  to  CDC 
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http://www.cdc.gov/flu/professionals/diagnosis.  The  maximum  days  for  typieal  influenza 
are  1 1  days. 

Assume  that  on  the  day  after  (Day  6)  the  symptom  shows  up  (Day  5)  the  parents 
go  to  pharmaeies  and  buy  the  influenza  medieation  for  their  ehildren.  This  means  the 
peak  of  drug  purchase  must  be  6  days  after  the  incidence  peak,  which  is  to  say,  the  peak 
of  drug  purchase  must  occur  on  Day  134.  This  means  the  peak  of  drug  purchase  above  is 
too  late,  by  5  days. 

The  WIZER  Inference  Engine  yields: 

(toolate  drug-purchase-onset) 

(op-shorten  symptom-onset) 

(op-higher  symptom-severity) 

(op-lower  going-to-pharmaey-threshold) 

After  conflict  resolution  by  the  priority  measure,  it  yields: 

(op-higher  symptom-severity) 

Again,  how  much  the  symptom-severity  should  inerease  is  determined  by  knowledge 
base,  ontology,  and  experiment  design. 
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For  the  next  eyele  of  simulation,  the  symptom  severity  is  inereased  by  100%  by 
the  value/link  adjustment  routine  using  a  simple  heuristie  knowledge  eneoded  in  its  rules. 
BioWar  is  re-run  and  then  WIZER  is  re-run.  The  result  for  drug  purehase  eurve  is  shown 
in  the  following  figure. 


Drug  Purchase  after  Parameter  Value  Change 


day 


Figure  14,  The  Peak  of  Influenza  Drug  Purchase  after  Change  Occurs  on  Day  135 

As  shown,  the  peak  of  drug  purehase  for  influenza  after  parameter  value  change  now 
occurs  on  Day  135.  This  is  3  days  after  the  peak  of  the  after-change  school  absenteeism 
and  7  days  after  the  incidence  peak.  This  means  the  peak  of  drug  purchase  above  is  one 
day  longer  than  the  maximum  empirical  length  of  time  between  the  drug  purchase  peak 
and  the  incidence  peak. 

The  WIZER  Inference  Engine  yields: 

(toolate  drug-purchase-onset) 
and  after  conflict  resolution,  it  produces: 

(op-higher  symptom-severity) 
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Thus  the  drug  purchase  peak  has  been  moved  to  be  within  one  day  of  the  maximum 
empirical  range  of  the  time  difference  between  the  incidence  and  the  drug  purchase 
peaks.  The  following  figure  shows  the  drug  purchase  curves  and  peaks  before  and  after 
parameter  value  change.  Also  shown  is  the  incidence  curve  and  peak.  The  Y-axis  unit 
denotes  either  the  drug  purchase  unit  for  drug  purchase  curves  or  the  number  of 
incidences  for  the  incidence  curve. 


Drug  Purchase  before  and  after  Change 
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Figure  15,  Drug  Purchase  Curves  before  and  after  Parameter  Value  Change 

As  shown,  the  drug  purchase  peak  is  moved  closer  to  the  incidence  peak  after  the 
parameter  value  change.  It  is  moved  closer  to  the  empirical  time  at  which  the  drug 
purchase  should  peak,  with  the  time  difference  of  only  one  day.  For  the  next  value 
adjustment,  WIZER  makes  a  slight  change  to  the  symptom  severity  value  as  the  time 
difference  between  the  peaks  of  the  simulated  drug  purchase  and  incidence  curves  lags  its 
maximum  empirical  range  by  only  one  day,  instead  of  the  previous  cycle’s  5  days. 
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6.6  Validation  Measures 


Validation  is  measured  based  on  a  pieee  of  knowledge  that  corresponds  to  a  data  stream. 
For  the  results  above: 

1.  Incidence  Factors:  the  simulated  actual  incidence  rate  is  lower  than  the  lower 
bound  of  the  empirical  observed  incidence  rate.  Strictly  speaking,  the  data  stream 
output  is  not  valid.  After  value  change  by  WIZER,  the  simulated  incidence  rate  is 
moved  to  be  within  the  empirical  range,  achieving  validity. 

2.  School  Absenteeism:  the  simulated  school  absenteeism  peak  occurs  later  than  it 
should  be.  Thus  this  data  stream  has  zero  validity,  strictly  speaking.  But  the 
comparison  of  the  shape/trend  of  the  curves  seems  to  indicate  that  the  validity 
level  is  much  higher  than  zero.  Furthermore,  after  value  change  by  WIZER,  the 
simulated  absenteeism  peak  is  moved  to  be  within  the  empirical  range,  achieving 
validity. 

3.  Drug  Purchase:  the  simulated  drug  purchase  peak  also  occurs  later  than  it  should 
be.  Strictly  speaking,  this  data  stream  is  invalid.  Again,  if  the  shape  of  the  curves 
is  compared,  it  looks  like  the  validity  level  for  this  data  stream  is  much  higher 
than  zero.  After  one  cycle  of  value  change  by  WIZER,  the  drug  purchase  peak  is 
moved  to  be  within  one  day  of  its  maximum  empirical  range. 

The  results  indicate  the  iterative  process  of  doing  validation.  In  addition  to  the  need  to 
perform  multiple  runs,  there  can  be  multiple  measures  of  validity. 
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6.7  WIZER  versus  Response  Surface  Methodology  for 
BioWar  Validation 


BioWar  has  hundreds  of  parameters.  The  resulting  parameter  spaee  is  gigantic.  Suppose 
that  the  Response  Surface  Methodology  or  RSM  (Myers  and  Montgomery  2002,  Carley, 
Kamneva,  and  Reminga  2004)  is  used  to  completely  characterize  BioWar  for  validation. 
Given  that  it  is  estimated  that  BioWar  has  200  parameters  (a  conservative  number)  and 
assuming  that  each  parameter  can  have  3  different  values  (3  levels),  the  parameter  space 
is  3^200  cells,  which  is  unmanageable  by  the  current  technology.  As  BioWar  is 
stochastic,  each  cell  requires  40  virtual  experiments  to  get  statistically  significant  results, 
incurring  40  times  increase  in  the  parameter  space.  Quantum  computers  might  someday 
make  the  execution  of40x  3  ^  200  simulations  feasible  but  not  today. 

Experimenters,  of  course,  can  divide  the  system  into  modules  and  validate  a 
module  by  module,  assuming  all  other  modules  have  reasonable  parameter  values  and  the 
existence  of  some  modularity  in  the  system.  If  this  is  done  for  BioWar,  experimenters  can 
probe  the  relationships  between  incidence  rate  and  infection  factors  such  as 
ailmenteffectiveradius,  ailment_exchange_proximity_threshold,  and  base_rate. 
Assuming  each  of  these  factors  has  3  levels  (3  possible  values),  the  following  table  shows 
the  number  of  cells  required. 


Table  5,  Number  of  Cells  for  Validation  of  Incidence  Factors 


Parameter 

Categories 

Size 

Ailment  effective  radius 

500,  1000,  1500 

3 

Ailment  exchange 

proximity  radius 

500,  1000,  1500 

3 

Base  rate 

10%,  30%,  50%,  70% 

4 

As  shown,  the  total  number  of  cells  required  is  5  x  5  x  4  =  id  for  non-stochastic  program. 
Being  stochastic,  BioWar  requires  36  x  40  =  1,440  virtual  experiments,  which  is  perfectly 
manageable.  The  way  experimenters  decide  to  choose  the  parameters  and  the  parameter 
levels,  however,  is  totally  ad-hoc,  implicit,  and  unusable  for  computer  operation. 

WIZER  enhances  the  way  experimenters  decide  which  parameters  and  what 
parameter  levels  to  choose  by  codifying  the  knowledge  in  a  form  that  is  clear,  explicit. 
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and  operable  by  eomputers.  It  is  eodified  in  the  form  of  knowledge  bases  and  ontology. 
With  its  inference  engine,  WIZER  can  reason  about  parameters  and  simulation  results 
producing  new  inferences,  that  is,  inferences  that  no  human  experimenters  have  input  or 
thought  of  before.  Furthermore,  utilizing  its  knowledge  inference,  WIZER  can  further 
reduce  the  number  of  virtual  experiments  needed.  The  above  number  of  virtual 
experiments  for  RSM  of  1,440  is  the  upper  limit  of  what  WIZER  needs.  Typically, 
WIZER  needs  fewer  than  that  due  to  its  inferences  about  simulation  results  after  each 
simulation  cycle. 
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6.8  Summary 


WIZER  is  shown  to  be  able  to  partially  validate  the  BioWar  simulation  model.  The 
ineidenee  faetors  and  the  relative  timing  of  sehool  absenteeism  peak  are  validated  using 
WIZER.  The  relative  timing  of  drug  purehase  peak  is  almost  validated:  it  falls  within  one 
day  of  the  maximum  empirieal  relative  timing.  The  results  show  the  use  of  Alert  WIZER 
to  deseribe  the  output  data  streams/eurves  and  eompare  them  to  produee  symbolie  or 
semantie  alerts. 

The  results  show  that  while  WIZER  is  eapable  of  doing  validation,  validation 
depends  eritieally  on  the  provided  knowledge.  This  brings  up  the  issues  of  knowledge 
engineering  and  knowledge  aequisition.  Eortunately,  as  part  of  the  proeess  of  simulation 
model  development  and  of  validation,  the  task  of  both  knowledge  engineering  and 
knowledge  aequisition  ean  be  handed  over  to  the  respeetive  stakeholders:  the  task  to 
ereate  simulation  knowledge  spaee  to  the  simulation  developers  and  the  task  to  ereate 
domain  knowledge  to  the  validators  or  the  VV&A  praetitioners. 
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Chapter  VII:  CONSTRUCT  Testbed 


CONSTRUCT  (Carley  1990,  Carley  1991,  Schreiber  and  Carley  2004)  is  a  multi-agent 
model  of  group  and  organizational  behavior  in  the  form  of  networks,  capturing  the  co¬ 
evolution  of  cognition  (knowledge)  and  structure  of  said  networks. 

This  chapter  explains  why  automated  validation  for  CONSTRUCT  is  desirable.  It 
also  provides  several  partial  validations  of  CONSTRUCT  using  WIZER.  It  shows  one 
simple  hypothesis  building  and  testing  case:  asking  the  question  of  what  if  the 
management  is  not  homogeneous  as  it  is  assumed  before. 


7.1  Description  of  CONSTRUCT 


CONSTRUCT  adopts  the  constructural  theory  for  the  formation  of  social  structure  and 
knowledge.  This  means  when  a  person  interacts  with  another,  he/she  exchanges 
knowledge.  This  exchange  modifies  both  persons’  store  of  knowledge.  The  change  in  the 
store  of  knowledge  in  turn  affects  with  whom  a  person  will  interact  next.  Thus  both  the 
cognition  (knowledge)  and  structure  of  the  interaction  network  change. 

As  an  example,  suppose  that  person  A  interacts  with  person  B,  and  learns  that 
person  B  knows  for  a  fact  that  there  is  a  sale  going  on  at  Macy’s  on  a  particular  Sunday. 
Once  person  A  learns  this  fact,  this  person  A  may  go  to  Macy’s  on  Sunday  and 
inadvertently  meets  person  C  who  also  knows  the  same  fact  (from  someone  or 
somewhere).  Once  in  proximity,  person  A  and  person  C  may  interact  and  exchange 
another  bit  of  knowledge.  Notice  that  once  person  A  and  person  C  interact  with  each 
other,  their  social  network  changes.  This  shows  how  social  networks  can  change  due  to 
change  in  knowledge.  The  piece  of  knowledge  person  A  and  person  C  exchange  with 
each  other  may  affect  their  decisions  about  who  to  interact  next,  where  to  go,  what  to  do. 
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and  so  on.  The  change  in  social  networks  in  turn  affects  which  piece(s)  of  knowledge  gets 
exchanged  next. 

The  above  determination  on  who  to  interact  based  on  common  bits  of  knowledge 
is  known  as  homophily.  A  wealth  of  social  science  studies  has  shown  that  people 
consciously  and  unconsciously  prefer  to  interact  with  people  who  look  like  themselves 
(e.g.,  have  the  similar  age,  hobbies,  socioeconomic  status,  etc.).  This  kind  of  interactions 
is  also  known  as  social  or  emotional  ties.  Another  mode  of  interaction  is  geared  toward 
seeking  expertise  or  information  one  does  not  have.  A  patient  seeking  a  doctor  is  a 
perfect  example.  This  is  known  as  instrumental  or  information  seeking  ties.  The  social 
ties  are  usually  symmetrical  or  reciprocal,  while  the  instrumental  ties  are  asymmetrical. 

All  of  the  above  has  been  turned  into  mathematical  formulas  in  CONSTRUCT. 
CONSTRUCT  mimics  how  people  interact  and  how  social  ties  are  formed  and  dissolved. 
Augmented  by  how  friendships  and  enmities  are  estimated  from  interaction  probabilities, 
CONSTRUCT  is  able  to  predict  the  formation  and  dissolution  of  friendships  and 
enmities,  and  in  the  case  of  Kapferer’s  Zambia  tailor  shop,  the  possibility  of  a  successful 
strike.  An  augmented  version  of  CONSTRUCT  can  encode  that  a  person  knows  another 
person  knows  something.  In  other  words,  transactive  memory  can  be  handled.  The 
mathematical  details  of  CONSTRUCT  are  provided  in  (Carley  1990,  Carley  1991, 
Schreiber  and  Carley  2004). 


7.2  The  Need  for  Automated  Validation 


In  the  original  CONSTRUCT  paper  (Carley  1990),  CONSTRUCT  was  validated  using 
Kapferer's  Zambia  tailor  shop  worker  and  management  interaction  network  data.  The 
transactive  memory  version  of  CONSTRUCT,  the  CONSTRUCT-TM,  has  also  been 
validated  several  times,  with  the  latest  validation  finding  a  significant  correlation  between 
communication  patterns  in  real-world  organizations  and  agent  interactions  in  the 
CONSTRUCT  model  (Schreiber  and  Carley  2004).  CONSTRUCT  has  a  derivative  called 
DyNet  (Carley,  Reminga,  and  Kamneva  2003)  which  includes  an  agent  removal  feature. 
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The  validations  above  were  done  semi-automatieally,  with  minimal  assistance 
from  computer  tools.  CONSTRUCT  has  a  large  parameter  and  model  space  partially  due 
to  its  knowledge  vector,  interaction  modes,  network  data,  and  information  and  application 
contexts,  so  it  is  desirable  to  have  the  validation  knowledge  managed  and  to  have  the 
validation  automated. 


7.3  Validation  Scenarios 


Australian  anthropologist  Bruce  Kapferer  observed  people  interactions  in  a  tailor  shop  in 
Zambia  (then  Northern  Rhodesia)  over  a  period  of  ten  months  in  1972.  He  collected  two 
sets  of  data.  The  first  set  of  data  was  collected  just  before  an  abortive  strike.  This  instant 
of  time  is  denoted  Timel.  The  data  is  collected  over  a  period  of  a  month.  After  seven 
months,  Kapferer  collected  a  second  set  of  data.  Shortly  after  this  second  data  collection, 
denoted  Time2,  a  successful  strike  took  place.  This  data  collection  also  took  a  month.  The 
data  sets  consist  of  both  the  "instrumental"  (work-  and  assistance-related)  interactions  and 
the  "sociational"  (friendship,  socioemotional)  interactions.  The  data  collections  occurred 
during  extended  negotiations  for  higher  wages.  The  data  is  in  the  form  of  matrices  of 
interactions,  which  can  be  transformed  into  matrices  of  person-knowledge. 

Here  I  create  three  validation  scenarios  for  CONSTRUCT.  Validation  Scenario  I 
demonstrates  how  WIZER  can  be  used  to  facilitate  the  validation  of  CONSTRUCT 
against  Kapferer's  data  of  empirical  average  interaction  probability  among  workers. 
There  are  39  maximally  heterogeneous  workers  data  that  I  use  here.  “Maximally 
heterogeneous”  means  the  workers  have  the  most  diverse  background  and  knowledge. 
They  are  significantly  different  from  one  another.  Here  the  behavior  of  workers  is 
examined  by  probing  the  change  of  the  average  probability  of  interaction  among  workers 
before  the  successful  strike.  Validation  Scenario  II  shows  how  WIZER  facilitates  the 
validation  of  CONSTRUCT  against  Kapferer’s  tailor  shop  data,  but  with  the  examination 
of  two  group  behaviors  and  one  intergroup  behavior.  The  groups  are  the  maximally 
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heterogeneous  workers  and  homogeneous  management.  Validation  Scenario  III  plays  out 
a  “what-if  ’  scenario  of  the  management  being  not  homogeneous. 


7.3.1  Validation  Scenario  I:  Interaction  Probability  around  the  Time  of 
the  Successful  Strike 

Based  on  the  empirical  network  data  he  gathered,  Kapferer  calculated  that  the  workers 
had  the  interaction  probability  of  0.005502  just  before  the  successful  strike  at  Time2.  In 
this  validation  scenario,  the  CONSTRUCT  model  is  initialized  by  the  network  data  at 
Timel  (the  time  just  before  the  abortive  strike)  for  the  start  of  simulation.  It  is  run  for  30 
simulation  trials  with  the  maximum  of  45  time-steps  per  simulation  trial. 

The  knowledge  base  is  as  follows. 

The  causal  conceptual  diagram; 

(causes  interaction  knowledge-exchange) 

(causes  knowledge-exchange  shared-knowledge) 

(causes  shared-knowledge  interaction) 

This  causal  diagram  is  cyclic  but  it  has  a  time  delay  between  the  causes  and  effects.  The 
unary  operand  “op-valid”  means  the  simulation  is  valid  for  the  case. 

The  rules  related  to  the  causal  relations: 

(if-then  (toolow  interaction)  (op-higher  shared-knowledge)) 

(if-then  (toohigh  interaction)  (op-lower  shared-knowledge)) 

(if-then  (lower  shared-knowledge)  (op-lower  knowledge-exchange)) 

(if-then  (higher  shared-knowledge)  (op-higher  knowledge-exchange)) 

(if-then  (lower  knowledge-exchange)  (op-lower  interaction)) 

(if-then  (higher  knowledge-exchange)  (op-higher  interaction)) 

(if-then  (lessthan  interaction-probability-after-strike  emp-avg-int-prob-strike  ) 
(op-toolow  interaction)) 

(if-then  (morethan  interaction-probability-before-strike  emp-avg-int-prob-strike  ) 
(op-toohigh  interaction)) 

(if-then  (and 
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(morethan  interaction-probability-after-strike  emp-avg-int-prob-strike  ) 
(lessthan  interaction-probability-before-strike  emp-avg-int-prob-strike)) 
(op-valid)) 


The  empirical  data  is  declared  as  follows. 

(setvalue  emp-avg-int-prob-strike  0.005502) 

(setbelief  emp-avg-int-prob-strike  1 .0) 

The  simulation  instantiations  of  variables  are  as  follows. 

(setvalue  workers-matrix  Kapferer-Timel-workers-data) 
which  initiates  the  simulation  starting  state  with  the  workers  data  at  Timel . 

(setmode  construct-interaction-mode  homophily) 

CONSTRUCT  is  run  for  30  trials  and  the  Alert  WIZER  takes  the  average  probability  of 
interaction  output  of  CONSTRUCT  and  looks  for  the  number  before  and  after  the 
successful  strike.  The  following  figure  shows  the  average  probability  of  interaction 
among  workers  as  a  function  of  time.  The  timestep  “unit”  correlates  loosely  with  the 
actual  period  of  time,  which  is  to  say,  I  do  not  perform  time-validation  here.  The  Alert 
WIZER  spots  the  transition  (before  and  after  the  strike)  at  Timestep  30,  which  has  the 
average  interaction  probability  of  0.005188  (which  is  less  than  0.005502).  The  next 
timestep,  Timestep  31,  sees  the  average  interaction  probability  of  0.008612  (which  is 
more  than  0.005502).  The  two  probabilities  bracket  the  empirical  probability  of 
interaction,  thus  the  simulated  average  interaction  probabilities  and  the  simulation  model 
for  this  case  are  correct.  This  indicates  that  CONSTRUCT  is  valid  for  this  case. 
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Average  Probability  of  Interaction 


Figure  16,  The  Average  Probability  of  Interactions  among  Workers 


The  average  probability  of  interaetions  among  workers  is  shown;  the  transition 
representing  the  successful  strike  happens  between  Timesteps  30  and  31.  Kapferer’s 
empirical  average  interaction  probability  of  0.005502  lies  between  the  average  interaction 
probabilities  of  Timestep  30  and  31. 

In  WIZER  inference  traces, 

The  simulation  instantiations  of  outputs: 

(setvalue  interaction-probability-before-strike  0.005188) 

(setvalue  interaction-probability-after-strike  0.008612) 

The  inference  engine  has  the  following  step. 

(if-then  (and  (morethan  0.008612  0.005502) 

(lessthan  0.005188  0.005502)) 

(op-valid)) 

(op-valid) 
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One  interpretation  of  the  above  result  is  that  as  the  interaetions  and  shared- 
knowledge  increase,  the  workers  are  being  primed  for  a  leap  of  faith  of  increased 
unionization  (and  thus  homogenization  and  radicalization).  Increased  unionization 
increases  the  risks  of  confrontation  with  the  management.  However,  to  really  explain  why 
the  successful  strike  happened  it  is  necessary  to  account  for  friendships  and  enmities.  The 
treatment  of  friendships  and  enmities  and  the  explanation  of  why  the  successful  strike 
occurred  were  given  in  (Carley  1990). 


7.3.2  Validation  Scenario  II:  Maximally  Heterogeneous  Workers  and 
Homogeneous  Management 

In  the  Zambia  tailor  shop,  there  was  a  management  consisting  of  4  Indians  among  the 
mostly  African  workers.  One  Indian  of  these  four,  Patel,  serves  as  the  actual  factory 
manager.  Most  of  the  interactions  between  workers  and  management  occurred  between 
the  workers  and  Patel.  The  management  is  homogeneous.  They  interact  with  each  other 
most  of  the  time  and  have  the  same  culture  and  economic  status. 

During  the  period  between  the  abortive  strike  and  the  successful  strike,  the 
interactions  among  workers  and  between  workers  and  management  increased.  In  this 
validation  scenario,  WIZER  is  setup  to  allow  detection  and  comparison  of  the  trends  of 
the  change  of  interaction  probabilities  within  and  between  groups.  The  knowledge  base  is 
as  follows. 

The  causal  conceptual  diagram: 

(causes  (homogeneous  management)  (increasing  intergroup-interaction-change)) 
(causes  (homogeneous  management)  (increasing  workers-interaction-change)) 
(causes  (homogeneous  management) 

(higherthan  intergroup-interaction-change  workers-interaction-change)) 
(causes  (homogeneous  management) 

(higherthan  workers-interaction-change  management-interaction-change)) 
(causes  (wage  negotiations)  (increasing  intergroup-interaction-change)) 
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The  rules  related  to  the  eausal  relations  relevant  here  are: 

(if-then  (higherthan  workers-interaetion-change  management-interaetion-ehange) 
(op-valid)) 

(if-then  (higherthan  intergroup-interaetion-ehange  workers-interaetion-change) 
(op-valid)) 

The  simulation  instantiations  of  variables  are  as  follows. 

(setvalue  management  homogeneous) 

(setvalue  workers-matrix  Kapferer-Timel-workers-data) 

(setvalue  construct-interaction-mode  homophily) 

CONSTRUCT  is  run  for  30  Monte-Carlo  trials.  Alert  WIZER  processes  the 
output  interaction  curves,  and  produces  the  curve  trend  comparisons  as  symbolic  or 
semantic  information.  The  simulation  instantiation  of  outputs  are: 

(higherthan  workers-interaetion-change  management-interaction-change) 

(higherthan  intergroup-interaction-change  workers-interaetion-change) 

The  Inference  Engine  then  produces: 

(op-valid) 

This  means  that  the  CONSTRUCT  model  for  the  case  of  homogeneous  management 
validly  reproduces  the  empirical  trends  of  the  change  of  interaction  probabilities  for  the 
workers-group,  the  management-group,  and  the  workers-management  intergroup  as 
observed  by  Kapferer  during  the  period  between  the  abortive  strike  at  Timel  and  the 
successful  strike  at  Time2. 

The  following  figure  shows  the  interaction  curves  as  measured  by  the  percent 
change  of  probability  of  interactions. 
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Inter-  and  Intra-group  Percent  Change  of  Interaction  Probability 


♦  workers 
—■—management 
—A—  intergroup 


Figure  17,  Percent  Change  of  Interaction  Probabilities  for  the  Workers  Group,  the 
Management  Group,  and  the  Intergroup 

The  significantly  increased  workers-management  interaction  change  can  be  a  catalyst  for 
the  strike  is  one  interpretation  of  the  results.  How  exactly  this  plays  out,  however, 
depends  on  how  enmities  and  friendships  are  formed.  Increased  workers  intragroup 
interactions  could  lead  to  more  integration  among  workers,  forming  an  almost- 
homogeneous  group  challenging  the  homogeneous  management  group,  resulting  in  a 
successful  strike.  This  almost  homogeneous  worker  state  stands  in  contrast  to  the  initial 
maximally  heterogeneous  state  that  the  workers  were  in. 


7.3.3  Validation  Scenario  III:  Maximally  Heterogeneous  Workers  and 
Heterogeneous  Management 

Previously  WIZER  has  shown  that  the  CONSTRUCT  model  for  homogeneous 
management  case  is  valid  in  light  of  the  empirical  trends  of  interaction  probability 
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change.  It  also  indicates  that  homogeneous  management  could  be  a  factor  contributing  to 
the  successful  strike.  Now  we  are  curious  of  what  would  transpire  if  the  management  is 
not  homogeneous.  This  curiosity  is  encoded  in  ontology.  WIZER  can  handle  the  “what-if 
the  management  is  heterogeneous  question”  by  doing  a  search  in  the  ontology,  forming  a 
new  causal  conceptual  diagram,  and  then  doing  hypothesis  testing.  In  the  extended  N3 
notation,  the  ontology  for  the  management  is  written  as  having  the  attributes: 

<management>  <has-type-of>  <homogeneous,  heterogeneous>  . 

The  domain  knowledge’s  inference  engine  executes  the  what-if  statement  of:  if 
homogeneous  type  of  management  has  been  probed  then  probe  other  types  of 
management.  As  shown  the  probing  of  other  types  of  management  is  assisted  by 
ontology.  This  is  similar  to  model  perturbations  in  the  model-based  reasoning.  The  exact 
mapping  of  the  management  attributes  of  homogeneous  or  heterogeneous  to  the 
interaction  and  person-knowledge  matrices  is  declared  by  Alert  WIZER's  symbolic  or 
semantic  characterization  of  inputs,  instead  of  outputs,  with  the  help  from  ontology.  As 
described  earlier.  Alert  WIZER  is  capable  of  doing  symbolic  and  semantic  categorization 
of  numeric  and  network  data. 

The  causal  conceptual  diagram  for  this  what-if  case  becomes: 

(causes  (heterogeneous  management)  (higher  intergroup-interaction-change)) 
(causes  (heterogeneous  management)  (higher  workers-interaction-change)) 

(causes  (heterogeneous  management) 

(higherthan  intergroup-interaction-change  workers-interaction-change)) 
(causes  (heterogeneous  management) 

(higherthan  workers-interaction-change  management-interaction-change)) 

The  rules  related  to  the  causal  relations  are: 

(if-then  (higherthan  workers-interaction-change  management-interaction-change) 
(heterogeneous  management)) 

(if-then  (higherthan  intergroup-interaction-change  workers-interaction-change) 
(heterogeneous  management)) 
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The  simulation  instantiations  of  variables  are  as  follows. 

(setvalue  management  heterogeneous) 

(setvalue  workers-matrix  Kapferer-Timel-workers-data) 

(setvalue  eonstruet-interaetion-mode  homophily) 

CONSTRUCT  is  run  for  30  Monte-Carlo  trials.  Alert  WIZER  proeesses  the 
output  interaetion  eurves,  and  gives  out  the  comparisons.  The  following  figure  shows  the 
percent  change  of  interaction  probability  for  the  heterogeneous  management  group,  the 
workers-management  intergroup,  and  the  maximally  heterogeneous  workers  group. 


The  Case  of  Heterogeneous  Management 


Figure  18,  Percent  Change  of  Interaction  Probability  for  Heterogeneous 
Management  Group,  Heterogeneous  Workers  Group,  and  the  Intergroup 


The  simulation  instantiation  of  outputs  are  then; 

(higherthan  intergroup-interaction-change  workers-interaction-change) 

(equal  workers-interaction-change  management-interaction-change) 

This  means  the  initial  assertion  of  (higherthan  workers-interaction-change  management- 
interaction-change)  is  false.  Thus  the  initial  rule  of: 
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(if-then  (higherthan  workers-interaction-change  management-interaction-change) 
(heterogeneous  management)) 
is  false  too.  Moreover  the  initial  causal  relation  of: 

(causes  (heterogeneous  management) 

(higherthan  workers-interaction-change  management-interaction-change)) 
is  correspondingly  false.  This  results  in  WIZER  purging  one  causation  and  one  rule  from 
the  CONSTRUCT  model  declaration  for  the  heterogeneous  case.  The  causation  and  rule 
are  replaced  by: 

(causes  (heterogeneous  management) 

(equal  workers-interaction-change  management-interaction-change)) 
(if-then  (equal  workers-interaction-change  management-interaction-change) 
(heterogeneous  management)) 

In  this  simple  way,  WIZER  “learns”  by  inference  after  the  model  outputs  are  compared 
against  the  empirical  data. 

It  turns  out  that  the  changed  data  indicates  the  interaction  probability  trends  for 
heterogeneous  management  for  the  CONSTRUCT  model.  The  result  also  shows  that 
WIZER  can  reduce  the  amount  of  simulation  parameter  search  by  doing  the  search  or 
inference  in  the  conceptual  space  or  in  ontology.  The  homogeneous  or  heterogeneous 
conceptual  symbol  has  multiple  manifestations  in  the  interaction  (and  the  person- 
knowledge)  matrices.  We  do  not  need,  however,  to  examine  each  and  every  combination 
of  the  matrix  values  as  the  values  can  be  categorized  at  the  symbolic  level  by  the 
semantic  label  of  homogeneous  and  heterogeneous  in  ontology.  If  robustness  is  desired, 
we  can  take  a  statistical  sample  of  several  different  matrix  values,  but  nothing 
approaching  brute-force  or  Monte  Carlo  sampling  is  needed. 

Additionally,  to  get  to  the  difference  between  homogeneous  and  heterogeneous 
management,  WIZER  is  set  up  to  compare  the  results  between  two  scenarios  above.  Alert 
WIZER  does  the  comparison  of  same  curves  and  of  the  differences  between  curves.  It 
produces: 

(morethan  intergroup-interaction-probability-change-heterogeneous 
intergroup-interaction-probability-change-homogeneous) 

This  new  rule  can  be  used  for  further  inference  in  WIZER  Inference  Engine. 
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The  above  results  can  be  interpreted  as: 

(1)  When  the  management  is  heterogeneous,  they  are  practically  very  similar  to 
workers,  thus  their  percent  change  of  interactions  is  almost  the  same. 

(2)  When  the  management  is  heterogeneous,  their  intergroup  interaction  change  is 
higher  that  that  of  the  case  of  homogeneous  management  due  to  the  increased 
management-workers  interactions  as  management  and  workers  are  similar.  The 
management  does  not  form  a  cohesive/homogeneous  group. 

Whether  the  heterogeneous  management  could  prevent  a  successful  strike,  however, 
cannot  be  explained  by  the  interaction  probability  change  alone,  as  the  measures  of 
friendship  and  enmity  are  needed.  The  increased  interaction  probability  change  could 
lead  to  both  increased  unification  and  friendship  (for  workers)  and  increased  strife  (for 
workers-management  intergroup).  How  the  heterogeneity  of  management  affects 
enmities  and  friendships,  which  in  turn  affects  the  change  of  a  successful  strike  taking 
place,  depends  on  how  friendships  and  enmities  are  determined. 


7.4  Validation  Measures 


As  validation  is  dependent  on  a  specific  knowledge,  the  validity  of  the  results  is  as 
follows: 

1.  Average  Interaction  Probabilities  around  the  Successful  Strike:  the  average 
interaction  probabilities  bracket  the  empirical  interaction  probability  around  the 
successful  strike.  This  means  the  CONSTRUCT  model  is  valid  as  measured  by 
the  average  interaction  probability  knowledge  for  the  workers-group. 

2.  Maximally  Heterogeneous  Workers  and  Homogeneous  Management:  the 
intergroup’s  increased  change  of  interaction  probability  is  much  more  than  the 
workers’  change  of  interaction  probability,  which  in  turn  is  more  than  the 
management’s  change  of  interaction  probability.  This  fits  the  trends  of  what 
empirically  transpired  between  the  period  of  time  after  the  abortive  strike  and 
before  the  successful  strike,  during  which  wage  negotiations  occurred. 
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3.  Maximally  Heterogeneous  Workers  and  Heterogeneous  Management:  this  is  a 
hypothesis  building  and  testing  seenario,  so  there  is  no  validity  value  is  assigned, 
as  there  is  no  eorresponding  empirieal  ease.  However,  as  workers’  and 
heterogeneous  management’s  ehanges  of  interaetion  probability  are  more-or-less 
equal,  it  indieates  that  heterogeneity  makes  workers  and  management  behave 
more  like  eaeh  other.  Also,  as  the  inerease  in  the  ehange  of  the  interaetion 
probability  between  workers  and  management  -  the  intergroup  -  is  higher  that 
that  of  the  homogeneous  management,  it  indieates  that  heterogeneity  eontributes 
to  the  inereased  interaetion  between  workers  and  the  non-homogeneous 
management.  It  seems  that  diversity  has  resulted  in  more  interaetions  between 
different  groups.  How  these  inereased  interaetions  eontribute  to  friendships  and 
enmities  however  depends  on  how  friendships  and  enmities  are  formed. 


7.5  WIZER  versus  Response  Surface  Methodology  for 
CONSTRUCT  Validation 


CONSTRUCT  has  many  parameters:  the  size  of  the  knowledge  veetor,  number  of  agents, 
type  of  eommunieation  mode  (homophily,  information  seeking,  ete.),  type  of  exehange, 
interaetion  matrix,  number  of  groups,  knowledge  matrix  (or  the  pereentage  of  known 
faets),  proximity  matrix,  and  others.  The  task  of  eompletely  eharaeterize  CONSTRUCT 
using  Response  Surfaee  Methodology  or  RSM  (Myers  and  Montgomery  2002,  Carley, 
Kamneva,  and  Reminga  2004)  beeomes  unmanageable  due  to  eombinatorial  explosion. 
Suppose,  for  the  best  ease,  that  we  have  3  levels  (3  different  values)  for  eaeh  parameter  in 
a  CONSTRUCT  program  having  a  total  of  8  parameters.  This  gives  rise  to  3^8  =  6,561 
eells  or  oases.  If  the  oells  all  oorrespond  to  non-stoohastio  variables,  then  the  number  of 
virtual  experiments  needed  is  6,561  whioh  is  huge.  Let's  assume  eaeh  stoohastio  oell 
needs  40  trials  to  get  statistioally  signifioant  results.  If  all  the  above  oells  oorrespond  to 
stoohastio  variables,  then  that  number  inoreases  to  262,440  whioh  is  gigantio.  Doing 
262,440  virtual  experiments  is  diffioult  using  the  ourrent  state  of  oomputer  teohnology. 
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In  reality,  experimenters  think  through  and  choose  a  few  parameters  and 
parameter  values  that  correspond  to  policy  questions  and  “common  sense”.  The  following 
table  displays  the  number  of  cells  corresponding  to  a  typical  CONSTRUCT  setup. 


Table  6.  Number  of  Cells  for  a  Typical  CONSTRUCT  Experiment 


Parameter 

Categories 

Size 

Number  of  groups 

1 

1  (fixed) 

Number  of  agents 

100 

1  (fixed) 

Knowledge  size 

100 

1  (fixed) 

Percent  of  known  facts  in 
the  knowledge  matrix 

10%,  30%,  50%,  70% 

4 

Communication  mode 

Homophily,  information 

seeking,  50/50 

3 

Proximity  levels 

20%,  50%,  70% 

3 

The  above  gives  rise  io  4  x  3  x  3  =  36  cells.  As  CONSTRUCT  is  stochastic,  each  cell 
needs  40  virtual  experiments  to  get  statistically  significant  results.  Thus,  the  total  number 
of  virtual  experiments  required  is  1,440  simulation  trials,  which  is  large  but  manageable. 

The  validation  cases  of  determining  the  effects  of  homogeneous  management 
versus  heterogeneous  one  with  the  performance  measure  of  the  relative  magnitude  of 
change  in  average  interaction  probability  curves  are  more  complicated.  The  following 
table  shows  the  number  of  virtual  experiments  needed,  assuming  that  the  interaction  only 
has  2  levels  (binary). 


Table  1.  Heterogeneous  vs  Homogeneous  Management  Cell  Count 


Parameter 

Categories 

Size 

Number  of  groups 

Workers,  management, 

intergroups 

1  (fixed) 

Number  of  agents 

43 

1  (fixed) 

Knowledge  size 

3045,  a  function  of  the 
initial  interaction  matrix 

1  (fixed) 

Percent  of  known  facts  in 
the  knowledge  matrix 

Initiated  by  the  interaction 
matrix  at  time  Timel 

1  (fixed) 

Communication  mode 

Homophily 

1  (fixed) 

Initial  interaction  matrix, 
assuming  binary  elements, 
assuming  no  self 

interactions 

2  ^(43x42/2)  =2  ^903 

6.7622  e  271 
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Thus  probing  the  effects  of  heterogeneity  or  homogeneity  of  the  interaction  matrix  on  the 
relative  magnitude  of  the  change  in  the  average  interaction  probability  curves  takes  a 
gigantic  number  of  virtual  experiments  if  the  interaction  matrix  elements  are  binary  and 
the  program  is  non-stochastic.  If  the  element  is  not  binary,  but  say  can  have  an  integer 
value  from  0  to  20  (21  levels),  and/or  the  program  is  stochastic  (which  it  is)  then  the 
number  of  virtual  experiments  needed  becomes  impossible. 

Experimenters,  however,  think  through  the  above  problem  of  huge  number  of 
needed  virtual  experiments.  One  solution  is  to  focus  on  the  change  on  the  management 
part  of  the  interaction  matrix,  instead  of  the  total  management  and  workers  interaction 
matrix.  This  results  in  the  following  table.  The  management  consists  of  only  4  people. 


Table  8.  Revised  Heterogeneous  vs  Homogeneous  IVI 

lanagement  Cell  Count 

Parameter 

Categories 

Size 

Number  of  groups 

Workers,  management, 

intergroups 

1  (fixed) 

Number  of  agents 

43 

1  (fixed) 

Knowledge  size 

3045,  a  function  of  the 
initial  interaction  matrix 

1  (fixed) 

Percent  of  known  facts  in 
the  knowledge  matrix 

Initiated  by  the  interaction 
matrix  at  time  Timel 

1  (fixed) 

Communication  mode 

Homophily 

1  (fixed) 

Initial  interaction  matrix, 
assuming  binary  elements, 
assuming  no  self 

interactions 

2^  (4x3/ 2)  =2^6 

64 

The  above  table  shows  that  it  takes  64  cells  or  virtual  experiments  to  probe  the  effects  of 
initial  interaction  matrix  for  the  heterogeneous  versus  homogeneous  management  case. 
This  assumes  that  the  interaction  elements  are  binary  and  the  program  is  non-stochastic. 
As  CONSTRUCT  is  stochastic,  it  requires  64  x  40  =  2,560  virtual  experiments  for  the 
binary  element  case,  which  is  perfectly  manageable.  However,  the  interaction  matrix 
(based  on  the  empirical  Kapferer's  data)  can  contain  integer  levels  of  interaction  up  to  21 
levels  (counting  level  0).  This  incurs  the  total  required  virtual  experiments  to  be  27  ^  d  = 
85,766,121  for  the  non-stochastic  case,  which  is  gigantic.  Of  course,  experiments  may 
reduce  the  levels  to  symbolic  levels  of  “low,  medium,  or  high”  which  reduced  the  total 
required  cells  to  5  ^  d  =  729  cells  for  the  non-stochastic  case.  This  corresponds  to  29,160 
cells  for  the  stochastic  case,  which  is  large,  but  still  manageable. 
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The  above  only  considers  the  obstacles  to  RSM  validation  caused  by  the  large 
number  of  cells  or  virtual  experiments  needed.  Another  equally  -  if  not  more  so  -  hard 
problem  is  devising  a  function  relating  the  independent  variables  to  the  performance 
measure.  As  the  performance  measure  for  the  above  example  are  in  the  form  or  relative 
magnitude  between  curves  (the  curves  are  an  emergent  property  which  changes  little  for 
small  changes  in  the  interaction  matrix),  the  direction  of  changes  may  be  diluted  by 
random  noise  in  the  system.  Indeed,  for  the  heterogeneous  management  case,  the  curves 
of  management  and  of  workers  are  judged  to  be  the  same  even  though  they  differ  by  a 
non-zero  but  small  percentage.  It  is  the  relative  magnitude  that  matters.  The  matter  is 
made  complicated  by  the  difficulty  determining  in  which  direction  to  descent  on  the 
response  surface,  due  to  the  fact  that  homogeneity  is  an  abstract  property  of  the 
interaction  matrix  elements. 

Of  course,  experimenters  may  reduce  the  needed  processing  to  the  extreme  by 
inferring  that  only  interaction  matrices  representative  of  homogeneity  and  heterogeneity 
need  to  be  probed.  This  is  exactly  what  WIZER  does.  WIZER  enhances  the  thinking 
through  and  the  use  of  “common  sense”  that  experimenters  employ  further  by  adding 
knowledge  representation  and  knowledge-based  and  ontological  reasoning.  It  codifies  the 
symbolic  thinking  and  converts  “common  sense”  into  computer  operable  rules.  This 
codification  makes  computer  inferences  possible.  No  all  parameters  and/or  parameter 
values  combination  should  be  probed.  Extreme  points  may  have  to  be  probed  to  check  the 
robustness  of  the  model,  but  not  all  immediate  points  have  to  be  probed.  Without 
knowledge,  WIZER  degenerates  to  having  to  deal  with  the  same  number  of  virtual 
experiments  or  cells  as  RSM  does.  With  knowledge,  WIZER  only  needs  2  virtual 
experiments  for  the  non-stochastic  case  and  2  *  40  =  80  virtual  experiments  for  the 
stochastic  case  which  is  the  CONSTRUCT  program.  Sampling  the  surrounding  area 
around  the  two  cells  for  statistical  robustness  is  an  option,  but  not  a  requirement.  WIZER 
makes  the  probing  of  the  effects  of  homogeneity  and  heterogeneity  of  the  initial 
interaction  matrix  perfectly  manageable. 
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7.6  Summary 


WIZER  has  partially  validated  CONSTRUCT.  It  shows  that  CONSTRUCT  is  valid  with 
respect  to  the  average  interaction  probability  knowledge,  using  Kapferer’s  empirical 
average  probability  of  interaction  data.  It  also  shows  that  CONSTRUCT  is  valid  with 
respect  to  the  general  trend  and  the  relative  size  of  the  change  in  the  probability  of 
interactions  among  workers,  among  management,  and  between  workers  and  management. 
Finally,  WIZER  is  shown  to  be  able  to  construct  a  simple  hypothesis  (what  if  the 
management  is  heterogeneous)  from  its  ontology  using  ontological  reasoning,  and  test  it 
successfully.  In  the  process,  WIZER  gains  new  chunks  of  knowledge.  Here,  WIZER  is 
also  shown  to  be  able  to  reduce  the  search  space  significantly,  by  simply  examining  the 
“heterogeneous”  variable  value  in  knowledge  space,  which  has  many  manifestations  in 
the  management  interaction  matrix.  Instead  of  the  brute-force  examination  of  all  the 
manifestations  of  heterogeneity  in  the  management  interaction  matrix,  an  examination  of 
one  or  at  most  several  samples  (not  all)  of  them  is  sufficient. 
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Chapter  VIII:  Strengths  and  Weaknesses 
ofWIZER 


This  chapter  talks  about  the  strengths  and  weaknesses  of  the  eurrent  WIZER 
implementation.  This  includes  the  eomparisons  of  WIZER  against  the  Subjeet  Matter 
Expert  approach  and  Response  Surfaee  Methodology. 


8.1  The  Strengths  ofWIZER 


WIZER  is  a  general  knowledge-based  and  ontological  simulation  validation  and  model- 
improvement  tool.  It  has  the  following  advantages: 

1.  Unlike  formal  methods,  WIZER  can  validate  simulations  against  empirieal  data 
and  knowledge.  The  results  from  several  validation  seenarios  indieate  that 
WIZER  ean  be  used  to  improve  simulation  models  by  perturbations  in  the  model 
deseription.  The  perturbations  are  guided  by  ontologieal  and  knowledge 
inferenee. 

2.  The  models  and  rules  in  WIZER  are  relatively  easy  to  speeify  and  use.  The 
diffieulty  is  at  the  programmer  level,  not  at  the  expert  level.  The  teehnical 
diffieulty  requiring  an  expertise  at  the  eomputer  seientist  level  and  the  resulting 
high  eost  in  time  and  resources  hinder  the  adoption  of  formal  methods  for 
software  verifieation. 

3.  Unlike  statisties,  simulations  validated  by  WIZER  are  more  preeise  and  require 
fewer  assumptions.  Instead  of  assuming  the  abstraet  notion  of  “sample”, 
simulations  ean  represent  entities  elosely,  in  detail,  with  symbolie  information. 
Moreover,  they  do  not  assume  a  normal  distribution  and  random  sample. 
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4.  WIZER  can  understand  simulation  outputs  (e.g.,  curves)  semantically  and 
ontologically.  It  can  also  understand  simulation  inputs,  occurrences,  and  empirical 
data  semantically  and  ontologically. 

5.  WIZER  can  reduce  the  amount  of  search  needed  for  validation. 

6.  WIZER  can  focus  the  search  to  the  relevant  area  of  the  search  space. 

7.  WIZER  can  assist  in  closing  the  loop  of  modeling,  simulation,  inference,  and 
experiment  design. 

8.  WIZER  does  model  perturbations  avoiding  pure  rule-based  systems.  WIZER’ s 
rules  are  derived  from  and  tied  with  the  model.  While  heuristics  can  be  used,  the 
rules  can  encode  deep  knowledge. 


8.2  The  Weaknesses  of  WIZER 

As  a  tool,  the  currently  implemented  WIZER  has  the  following  weaknesses: 

1.  It  has  no  experiment  design  module.  The  experiment  design  module  can  be 
constructed  utilizing  ontology/semantics  and  causal  rules.  It  is  an  extension  of  the 
model-improvement  module,  with  hypothesis  building  and  experiment  design 
construction  using  ontology/semantics  and  causal  rules  added. 

2.  It  has  a  limited,  if  powerful,  mode  of  inference  in  the  form  of  forward-chaining 
and  ontological  reasoning.  There  is  a  need  for  the  research  into  more  sophisticated 
reasoning,  cognitive,  and/or  machine  learning  techniques  to  enhance  WIZER. 

3.  It  has  minimal  control  of  statistical  tools.  What  is  needed  is  the  extensive 
ontology  or  semantics  that  understands  statistical  and  mathematical  tools  (and 
concepts)  and  facilitates  the  use  of  them.  WIZER  currently  implements  only  a 
rudimentary  understanding  of  some  statistical  routines.  OpenMath  and  OWL  Lull 
are  a  good  starting  point  for  the  creation  of  extensive  ontology  for  calculus, 
statistics,  geometry,  and  other  mathematical  concepts  and  tools. 

4.  It  has  no  simulation  control.  What  is  needed  is  a  simulation  control  module  which 
is  capable  of  halting  the  simulation  once  the  result  is  obtained,  i.e.,  an  interactive 
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module  based  on  simulation,  knowledge  inferenee,  and  human  input.  This  module 
should  also  be  eapable  of  interactive  simulation  mode. 

5.  It  does  not  learn,  except  in  the  sense  of  search  and  hypothesis  building.  Machine 
learning  and  causal  learning  from  data  can  be  added. 

6.  It  still  requires  the  validation  of  its  knowledge  bases.  A  tool  to  validate  knowledge 
bases  automatically  with  empirical  data  is  needed. 

7.  Related  to  (6)  is  the  issue  of  how  precisely  to  weigh  and  assess  knowledge  against 
data,  if  the  two  are  in  conflict  with  each  other.  An  ontology  or  semantic  construct 
to  do  this  is  needed. 

Except  for  the  last  three  points  (points  5,  6,  and  7),  the  above  weaknesses  are  not 
conceptual.  They  are  implementation  issues. 
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8.3  WIZER  and  Subject  Matter  Expert  Approach 


In  VV&A,  subject  matter  experts  evaluate  the  validity  of  the  simulations.  Subject  matter 
experts  have  the  expert  insights,  experience,  and  knowledge  for  the  task.  They  are 
however  prone  to  the  pitfalls  such  as  cognitive  limitation  (especially  with  respect  to 
complex  large  simulations),  judgment  biases,  and  implicit  decision  making.  WIZER 
promotes  clarity,  transparency,  and  reproducibility.  The  following  table  summarizes  the 


capabilities  or  features  of  subject  matter  experts  versus  WIZER. 

_ Table  9.  Subject  Matter  Experts  versus  WIZER 


Feature 

Subject  matter  experts 

WIZER 

Teaming 

Yes 

No,  except  search  and 
hypothesis  building  & 
testing 

Earge  problem  handling 

With  difficulty 

Eacilitated 

Multiple  domain  integration 

Difficult,  by  Delphi  method 

Eacilitated 

Intuition  and  insight 

Yes 

No 

Transparency 

With  difficulty 

Yes,  with  grounded 

semantics  and  empirical 
underpinnings 

Clarity 

Difficult  for  large  problems 

Yes 

Implicit  biases 

Yes 

No 

Knowledge  level 

Expert  level  (deep 

knowledge) 

Intermediate  (ontological 

reasoning  and  mles) 

Instead  of  working  in  isolation,  subject  matter  experts  and  WIZER  can  work  in 
synergy.  This  results  in  better  and  deeper  knowledge,  encoding  of  intuition,  and  learning 
for  WIZER,  and  in  transparency,  clarity,  and  large  problem  solving  capabilities  for 
subject  matter  experts.  The  trend  of  computational  and  inferential  help  is  evident  in 
science,  where  the  use  of  computational  resources  in  the  form  of  cyber-environments  and 
packaged  data  mining/machine  learning  modules  for  scientists  has  increased. 
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8.4  WIZER  and  Response  Surface  Methodology 


Response  Surface  Methodology  (RSM)  is  a  set  of  statistical  and  mathematical  techniques 
for  developing,  improving,  and  optimizing  processes  (Myers  and  Montgomery  2002). 
The  applications  of  RSM  are  in  situations  where  several  input  variables  potentially 
influence  some  performance  measure  or  quality  characteristic  of  the  process.  As  a 
simulation  model  can  be  thought  of  as  a  mechanism  that  turns  input  parameters  into 
outputs,  it  can  be  approximated  with  RSM.  The  performance  measure  or  quality 
characteristic  is  called  the  response  or  the  yield.  The  input/process  variables  are  known  as 
independent  variables.  The  response  surface  methodology  includes  (Carley,  Kamneva, 
and  Reminga  2004); 

1 .  Experimental  strategy  for  exploring  the  space  of  the  independent  variables, 

2.  Empirical  statistical  modeling  to  develop  an  appropriate  approximating 
relationship  between  the  independent  variables  and  the  yield, 

3.  Optimization  methods  to  find  independent  variable  values  that  produce  desirable 
values  of  the  yield. 

RSM  can  be  used  for  validation  but  the  resulting  state  space  is  large,  which  is  then 
explored  using  Monte-Carlo,  simulated  annealing,  and  steepest  ascent  methods.  RSM  is  a 
mathematical  method,  in  contrast  to  WIZER  which  is  a  knowledge-based  method.  It 
screens  what  independent  variables  are  important,  builds  a  first-order  model  to  get  close 
to  the  optimum,  and  then  builds  a  second-order  model  (or  a  higher-order  polynomial  one) 
near  the  optimum  to  get  an  accurate  response  surface.  More  details  on  how  RSM  is  used 
for  validation  can  be  found  in  (Carley,  Kamneva,  and  Reminga  2004).  The  following 
table  contrasts  RSM  with  WIZER. 
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Table  10.  Resi 

ponse  Surface  Methodology  versus  WIZER 

Feature 

Response  Surface 

Methodology 

WIZER 

Operation 

Mathematical 

Knowledge-based 

Search  or  optimization 

Simulated  annealing  and 
steepest  ascent 

Knowledge  inference 

Earge  problem  handling 

Not  able  to 

Eacilitated 

Eocal  minima 

Can  get  trapped  with  no 
means  of  escape 

Depends  on  knowledge 
inference.  Knowledge 

inference  can  lead  to  escape 
from  local  minima 

Smoothness  of  surface 

Requires  some  smoothness 
of  response  surface 

No  requirement  for 

smoothness  of  response 
surface.  It  can  be  jagged. 

Computational  burden 

High,  most  states  must  be 
probed 

Intermediate,  knowledge- 
inference  allows  focus  of 
search 

Semantics  correspondence 
of  search  steps 

Very  low  (e.g.,  what  a 
steepest  ascent  step  means 
semantically  is  often  not 
clear) 

High,  as  it  is  knowledge- 
based 

Causal  processing 

No 

Yes 

Critical  parameters 

Must  be  known  a  priori 

Can  be  inferred 

Parameter  variation 

Varies  continuously 

throughout  the  experimental 
range  tested 

Varies  non-continuously  or 
continuously,  according  to 
knowledge  inferences 

Use  of  good  statistical 
principles 

Deficient 

Yes 

Handling  of  time -variant 
and  dynamic  response 

With  difficulty 

Eacilitated 

In  RSM,  the  surface  of  response  represents  the  search  space  to  find  optimum 
solutions.  WIZER  adds  to  the  surface  constraints  and  information  based  on  knowledge 
and  ontology  of  the  problem.  Due  to  this  additional  knowledge,  numerical  gradient  ascent 
on  the  surface  is  assisted  with  knowledge  about  the  local  surface  area.  The  sampling 
strategy/choice  of  local  points  on  the  surface  helps  to  determine  the  gradient  and  is 
guided  by  knowledge  inference.  Absent  smooth  surface,  WIZER  helps  the  gradient 
ascent  to  fly  over  to  other  “hills”.  Eocal  maxima  (or  minima)  can  be  avoided  or  tunneled 
through  (or  bridged  over)  by  knowledge  and  ontological  inference.  In  effect,  WIZER  acts 
as  if  it  is  a  symbolic  “response  surface”  method. 
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8.5  WIZER  and  Sensitivity  Analysis 


One  of  the  simulation  goals  is  to  determine  how  changes  in  the  input  parameters  and 
simulation  variables  affect  the  output  variables,  in  other  words,  how  robust  the  output  is 
with  respect  to  changes  or  even  violations  in  input  variables.  Sensitivity  analysis 
(Clement  and  Reilly  2000,  Breierova  and  Choudhari  2001)  is  a  procedure  to  determine 
the  sensitivity  of  the  outputs  to  changes  in  input  parameters.  If  a  small  change  in  a 
parameter  results  in  relatively  large  changes  in  the  outputs,  the  outputs  are  said  to  be 
sensitive  to  that  parameter.  This  may  mean  that  the  parameter  has  to  be  determined  very 
accurately  or  that  an  alternative  has  to  be  sought  to  get  low  sensitivity.  Sensitivity 
analysis  is  numerical.  WIZER  does  what  can  be  viewed  as  symbolic  sensitivity  analysis 
or  knowledge  sensitivity  analysis,  as  it  probes  the  changes  in  the  knowledge  space  in 
addition  to  the  simulation/parameter/numeric  space. 


8.6  WIZER  and  Influence  Diagram 


An  influence  diagram  (Clement  and  Reilly  2000)  is  a  simple  visual  representation  of  a 
decision  problem.  Influence  diagrams  offer  an  intuitive  way  to  identify  and  display  the 
essential  elements,  including  decisions,  uncertainties,  and  objectives,  and  how  they 
influence  each  other.  It  includes  the  decision  node,  the  chance  node,  the  objective  node, 
and  the  compute  (general  variable)  node.  Influence  diagrams  offer  visual  aids  for  humans 
to  construct  a  correct  model.  WIZER,  on  the  other  hand,  offers  causal  diagrams  in  the 
form  of  rules  and  ontologies  for  computers  to  process  automatically  to  aid  humans  in  the 
validation  and  improvement  of  a  model. 
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8.7  WIZER  and  Simulation  Systems 


Input  of  WIZER  includes  simulation  model  and  knowledge  bases  and  ontologies  tied  to 
the  model.  Eaeh  simulation  should  be  aeeompanied  by  knowledge  bases  and  inferenee, 
and  validated.  This  knowledge-integrated  simulation  faeilitated  by  WIZER  allows  us  to 
reason  with  simulation  aid  (to  reason  via  simulations  and  virtual  experiments),  instead  of 
just  reasoning  logieally  or  probabilistieally  (statistieally).  Simulation-based  inferenee  is 
made  feasible  through  WIZER.  Moreover,  onee  the  validated  simulations  are  used  to 
eonstruct  and  test  hypotheses  against  empirical  data,  the  knowledge  bases  and  ontologies 
ean  be  updated  or  learned.  Instead  of  Bayesian  Artificial  Intelligence,  simulation-based 
Artificial  Intelligence  is  clearer  and  more  aeeurate.  Instead  of  integrating  symbolic  and 
subsymbolic/connectionist  systems  like  what  ACT-R  model  does  (Anderson  et  al.  1997), 
here  symbolie  (knowledge-based)  and  simulation  systems  are  integrated. 


8.8  WIZER  and  Knowledge-based  Systems 


WIZER  grounds  knowledge-based  systems  through  validated  simulation  against 
empirieal  data.  The  validated  simulation  emulates  proeesses  and  meehanisms  of  the  real 
world.  Inferenee,  ontology,  knowledge  bases,  and  simulation  are  tied  with  eaeh  other. 
This  differentiates  WIZER  from  eonventional  knowledge-based  systems  sueh  as  Cye 
(Eenat  and  Guha  1990).  Cye  has  the  brittleness  of  knowledge-based  systems  due  to  its 
pure  logic  foundation  even  though  it  has  been  fed  a  massive  amount  of  faets  and  rules. 
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8.9  Quantitative  Metrics 


In  order  to  show  the  differences  between  WIZER  and  RSM,  quantitative  metrics  are 
devised.  These  metrics  include  the  size  of  the  search  space  and  the  focus  on  the  relevant 
portion  of  the  search  space.  The  values  for  these  two  metrics  are  determined  for  each 
validation  case.  The  following  table  shows  the  quantitative  comparison  of  WIZER  and 
RSM  for  the  CONSTRUCT  Validation  Scenario  III. 


Table  11.  Quantitative  Comparisons  of  WIZER  and  RSM 


WIZER 

RSM 

Size  of  search  space 

2x40  =  80 

At  least  2  ^  (4  X  4  /  2)  X  40  = 
256x40  =  10,240 

Eocus  quality 

100% 

At  most  2  /  256 

As  shown,  the  size  of  search  space  for  RSM  is  at  least  2  ^  (4  x  4  /  2)  =  256.  This  is 
because  there  are  4  persons  in  management  and  they  interact  with  other  symmetrically 
including  with  self,  assuming  the  interaction  is  binary.  (If  they  do  not  interact  with  self, 
then  the  number  of  connections  becomes  4x3/2.)  The  reason  why  it  is  “at  least”  256  is 
that  for  the  minimum  case  the  interaction  matrix  is  assumed  to  be  binary.  The  number  of 

o 

possible  states  or  cells  or  virtual  experiments  is  2  =  256.  In  reality,  the  interaction  matrix 
can  contain  any  non-negative  integer  elements.  Thus  the  size  of  search  space  for  RSM  is 
usually  much  larger.  The  focus  quality  for  RSM  displays  the  ratio  between  the  necessary 
search  (two  for  WIZER,  because  one  can  take  just  one  sample  for  each  symbolic 
category)  and  the  size  of  search  space  of  RSM.  As  CONSTRUCT  is  stochastic,  the  size 
of  search  space  in  the  table  was  multiplied  by  40  to  get  statistically  significant  results. 

The  reason  why  WIZER  is  able  to  reduce  the  size  of  the  search  space  is  that  the 
ontological  reasoning  shows  that  the  type  of  management  can  be  either  homogeneous  or 
heterogeneous.  It  is  also  because  the  fact  that  it  does  not  really  matter  what  permutation  is 
in  the  interaction  relationships  amongst  management,  as  they  can  be  characterized  as 
either  homogeneous  or  heterogeneous  for  the  “what-if  ’  scenario  question.  If  we  would 
like  to  examine  deeper  questions  such  as  what  output  a  particular  configuration  of 
interaction  matrices  would  predict,  then  the  size  of  search  space  changes. 
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For  complete  validation  of  BioWar  and  CONSTRUCT,  naive  RSM 
implementation  is  intraetable,  as  shown  in  the  following  table.  This  table  gives  estimates 
based  on  a  best-ease  estimate  of  the  number  of  parameters  of  complete  BioWar  and 
CONSTRUCT.  In  this  estimate,  BioWar  has  200  parameters,  while  CONSTRUCT  has 
10.  It  is  also  assumed  that  eaeh  parameter  has  3  value  levels,  for  the  optimistie  ease. 


Table  12.  Number  of  Cells  for  Naive  RSM 


Simulation  Engine 

#Cells  for  WIZER 

#Cells  for  Naive  RSM 

BioWar 

0(200  N)  =  0(N) 

3^200=  2.6561  e  95 

CONSTRUCT 

0(10  N)  =  0(N) 

3^10  =  59,049 

WIZER  does  not  perform  brute-foree  seareh  on  all  parameter  values.  Its  seareh  steps  are 
guided  by  inferences  on  parameters.  They  go  from  a  parameter  value  to  another. 
Furthermore,  the  ehange  in  parameter  value  can  be  diseontinuous  when  the  inference 
dietates  so.  If  eaeh  parameter  is  probed  N  times  by  WIZER  and  the  number  of  parameters 
is  P,  then  the  total  number  of  seareh  is  in  the  order  of  0(NP).  As  BioWar  and 
CONSTRUCT  are  stoehastie,  eaeh  eell  needs  40  simulation  trials  to  aehieve  statistieal 
signifioanee.  Thus  the  numbers  of  total  simulations  are  40  times  higher  than  the  numbers 
of  eells  as  shown  in  the  above  table. 

Of  eourse,  in  reality  no  one  does  Naive  RSM  exeept  for  small  problems. 
Experimenters  reduee  the  number  of  “core”  parameters  to  eonsider  based  on  sensitivity 
analysis,  poliey  eonsideration,  and  judgment  ealls.  Seetion  6.7  (partieularly  Table  4)  and 
Section  7.5  (particularly  Table  7)  deseribe  typieal  and  non-naiVe  RSM  validations  of 
BioWar  and  CONSTRUCT.  The  following  table  summarizes  the  number  of  cells  needed 
for  the  typieal  validations. 


Table  13.  Number  of  Cells  for  Typical  RSM 


Simulation  Engine 

#Cells  for  WIZER 

#Cells  for  Non-Naive  RSM 

BioWar  (ineidenee  faetors 
ease) 

0(2  N)  =  0(N) 

36 

CONSTRUCT 
(heterogeneous 
management  case) 

0(N) 

64 

As  shown,  the  number  of  eells  for  WIZER  depends  on  the  number  of  parameters 
eonsidered;  for  BioWar  it  is  3  parameters  (ailment  effective  radius,  ailment  exehange 
proximity  threshold,  and  base  rate),  for  CONSTRUCT  it  is  1  parameter  (initial  interaetion 
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matrix).  The  number  of  parameters  is  smaller  for  a  typieal  ease  (submodule)  of  validation 
(the  above  table,  Table  13)  than  for  a  eomplete  validation  (Table  12)  as  only  subsets  of 
parameters  spaee  and  of  model  are  eonsidered.  Due  to  the  experimenter's  pruning  of  the 
total  number  of  “core”  variables,  doing  RSM  is  feasible  while  tedious  for  parts  of 
BioWar  and  CONSTRUCT.  This  is  a  divide-and-conquer  approach.  Because  of  the 
stochasticity  of  BioWar  and  CONSTRUCT  each  cell  requires  40  simulation  trials  to  get 
good  statistical  significance.  This  means  the  number  of  simulations  using  RSM  for  the 
above  typical  BioWar  validation  is  1,440  simulation  trials.  For  CONSTRUCT,  the 
number  is  2,560  simulation  trials.  WIZER  encodes  the  knowledge  about  how  and  why 
“core”  variables  should  be  chosen  in  a  format  that  computers  understand  and  can  process 
automatically. 
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8.10  WIZER  among  Other  Network  Tools 


As  a  knowledge-based  and  ontologieal  reasoning  tool,  WIZER  ean  be  used  to  augment 
other  simulation  and  analysis  tools.  Existing  network  tools  for  dynamie  network  analysis 
inelude  AutoMap,  ORA  (Organizational  Risk  Analysis),  and  DyNet.  The  tools  funetion 
as  follows: 

o  AutoMap:  performs  network  relationships  extraction  from  textual  data, 
o  ORA:  performs  statistical  analysis  on  dynamic  networks  data, 
o  DyNet:  performs  simulation  of  dynamic  networks. 

WIZER  can  interface  with  DyNet  to  add  knowledge-based  and  ontological  reasoning  to 
the  simulation  of  dynamic  networks.  Through  Alert  WIZER,  WIZER  can  augment  the 
ORA  statistical  analysis  with  ontological  reasoning.  The  following  figure  shows  the 
interconnections  between  tools. 


Figure  19,  WIZER  Working  Together  with  ORA  and  DyNet 
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As  shown,  WIZER  performs  inferences  on  DyNet  simulations.  The  inferences  can  be  for 
validation  and  model-improvement  purposes  or  for  scenario  analysis  purpose.  The 
inferences  are  used  to  guide  DyNet  simulations.  WIZER  symbolically  and  ontologically 
characterizes  the  statistical  analyses  of  ORA  through  Alert  WIZER.  The  resulting 
symbolic  knowledge  is  then  used  for  reasoning  by  WIZER.  The  inferences  that  result 
from  this  reasoning  can  be  used  to  guide  ORA  statistical  analysis. 
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8.11  What  WIZER  Gains 


The  following  table  shows  what  WIZER  gains  when  used  for  Bio  War  and 
CONSTRUCT.  The  gain  is  eompared  against  what  normally  transpires  when  humans  do 
the  validation.  The  numbers  are  estimates  based  on  simulation  and  validation  experienee. 
The  time  it  takes  for  WIZER  (and  the  speed  of  WIZER)  depends  on  eomputer  speed, 
memory,  and  storage  eapaeity.  Being  a  pieee  of  software,  everything  in  WIZER  is 
obviously  limited  by  eomputer  eapabilities. 


Table  14,  WIZER  versus  Human  Validation  Gains 


Aspect  of 

Validation 

Bio  War  by 

human 

BioWar  by 
WIZER 

CONSTRUCT 
by  human 

CONSTRUCT 
by  WIZER 

Time  to  generate 
input  data 

Days  if  not 
weeks,  due  to 
the  data  aecess 
rights,  usage 
poliey,  non- 
diselosure 
rules,  privaey 
eoneerns,  data 
ownership 
rights,  and 

other 
problems. 

Days  if  not 
weeks,  and 

longer  than 

what  it  takes  if 
done  by 

human,  as  the 
data  needs  to 
be  formatted 
and  prepared 
for  eomputer 
processing 

Days 

Days  and 

longer  that 

what  it  takes  if 
done  by 

human,  as  the 
data  needs  to 
be  prepared  for 
computer 
processing 

Number  of  points 
in  response  surfaee 
that  ean  be 

estimated 

1  per  10 

minutes 

20  per  10 
minutes 

1  per  10 

minutes 

20  per  10 

minutes 

Ability  to  handle 
qualitative  data 

Poor 

Good,  by 

mapping  it  to 
numerical 
range  with 

added 
semantics 

Poor 

Good,  by 

mapping  it  to 
numerical 
range  with 

added 
semantics 

Ability  to  eompare 
means 

10 

eomparisons  a 
minute 

Many  more 
comparisons 
(>600)  a 

minute, 
limited  only 
by  computer 
speed 

20 

comparisons  a 
minute 

Many  more 

comparisons 
(>1200)  a 

minute,  limited 
only  by 

computer  speed 
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Ability  to  compare 
standard  deviations 

5  eomparisons 
a  minute 

Many  more 
eomparisons 
(>300)  a 

minute, 
limited  only 
by  eomputer 
speed 

10 

eomparisons  a 
minute 

Many  more 

eomparisons 
(>600)  a 

minute,  limited 
only  by 

eomputer  speed 

Number  of  data 
streams 

One  data 

stream 
examination 
per  15  minute 

Many  more 
data  stream 

examinations 
(>15)  per  15 
minutes, 
limited  only 
by  eomputer 
speed 

One  data 

stream 
examination 
per  15  minute 

Many  more 

data  stream 

examinations 
(>50)  per  15 
minutes, 
limited  only  by 
eomputer  speed 

Knowledge 

management 

Diffieult 

Eaeilitated 

Diffieult 

Eaeilitated 

Number  of  rules 
proeessed 

One  per  5 
minutes 

300  per  5 
minutes 

One  per  5 
minutes 

300  per  5 
minutes 

Number  of  eausal 

relations 

eonsidered 

One  per  5 
minutes 

300  per  5 
minutes 

One  per  5 
minutes 

300  per  5 
minutes 

Common  sense  in 
seleeting  eore 

variables 

Implieit  but 

good, 

depending  on 
experienee 

Explieit  and 

eomputer 

operable 

Implieit  but 

good, 

depending  on 
experienee 

Explieit  and 

eomputer 

operable 

Use  of  statistieal 
tools 

Depends  on 
experienee 

Eneoded  in 
the  inferenee 

Depends  on 

experienee 

Eneoded  in  the 
inferenee 

Doeumentation  of 
inferenee  and 

experiment  steps 

Need  extract 
work 

Ineluded  in 
the  inferenee 

traee 

Need  extra 

work 

Ineluded  in  the 
inferenee  traee 

Ability  to  explain 
simulation  results 

Depending  on 
experienee 

Part  of 

inferenee  traee 

Depending  on 
experience 

Part  of 

inferenee  traee 

Enforeed  preeision 

No 

Yes 

No 

Yes 

Enforeed  elarity 

No 

Yes 

No 

Yes 

Intuition 

Yes 

No 

Yes 

No 

Eearning 

Yes 

No,  exeept  for 
a  rudimentary 
hypothesis 
building  and 
testing 

Yes 

No,  except  for 
a  rudimentary 
hypothesis 
building  and 
testing 

Model  building 

eapability 

Depending  on 
experienee 

No,  only  a 
basie  model 
improvement 
ability 

Depending  on 
experienee 

No,  only  a 
basie  model 

improvement 
ability 

Thinking  outside 

Depending  on 

No 

Depending  on 

No 
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the  box? 

intelligence 

intelligence 

Man-hours 

Large 

Medium-to- 

Large 

Large 

Medium 

Retention  of 

knowledge 

Depends  on 
personnel 

Facilitated 

Depends  on 

personnel 

Facilitated 

Large  problem 

solving 

Possible,  e.g., 
by  careful 

analysis 

Facilitated 

Possible 

Facilitated 

Poliey  scope  taken 
into  account? 

Yes,  written 

Yes,  encoded 
and 

processable  by 
computers 

Yes,  written 

Yes,  encoded 
and 

processable  by 
computers 

Ability  to  handle 
quantitative  data 

Good,  assisted 
by  computers 
especially  for 
large  numbers, 
complex 
equations,  and 
extensive 
networks 

Yes 

Good,  assisted 
by  computers 

Yes 

Visualization  of 

the  data 

Need  computer 
assistance 

Not 

implemented 
yet,  but 

feasible 

Need  computer 
assistance 

Not 

implemented 
yet,  but 

feasible 

Exception  handling 

Good, 

depending  on 
experience 

Must  be  and 
can  be 

encoded 

Good, 

depending  on 
experience 

Must  be  and 
can  be  encoded 
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8.12  Summary 


This  chapter  talks  about  the  strengths  of  WIZER  which  include  the  capability  to  reduce 
and  narrow  the  seareh  for  the  purpose  of  validation.  It  also  talks  about  WIZER 
weaknesses  which  include  the  lack  of  model/causal  learning  from  empirical  data.  It  gives 
comparisons  of  WIZER  against  the  RSM  and  against  subjeet  matter  experts  approaehes. 
The  usability  of  WIZER  among  the  existing  social  networks  tools  of  ORA  and  DyNet  is 
outlined. 


168 


Chapter  IX:  WIZER  from  a  Computer 
Science  Perspective 


This  chapter  talks  about  WIZER  from  a  Computer  Science  and  Artificial  Intelligence 
perspective.  WIZER  is  a  knowledge-based  and  ontologieal  reasoning  system  for  the 
validation  and  model-improvement  of  simulations. 

WIZER  advoeates  the  centrality  of  hypothesis  formation  and  testing  in  reasoning 
systems.  In  Computer  Scienee  and  Artificial  Intelligence,  the  task  of  mimicking 
scientist’s  work  is  relegated  to  a  subfield  of  scientifie  discovery.  The  hypothesis 
formation  and  testing  is  not  reeognized  as  the  one  of  the  most  important  reasoning 
methods.  (Bayesian  networks  have  hypothesis  formation  but  only  in  the  sense  of 
Bayesian  conditionals.)  Additionally,  causal  and  ontological  reasoning  is  important. 
Underlying  eausal  and  ontologieal  reasoning  is  proeess  reasoning/logie. 

If  the  history  of  science  eould  be  a  guide,  the  seientifie  progress  depends  on 
hypothesis  formation  and  testing  -  in  addition  to  observation.  While  reinforeement 
learning,  case-based  reasoning,  genetic  algorithm,  first-order  logic,  second-order  logic, 
Bayesian  networks,  and  other  reasoning  methods  in  Artificial  Intelligence  are  useful,  they 
are  not  employed  in  seientifie  work  as  the  primary  method.  Induetive  reasoning 
employed  in  scientific  discovery  is  one  exception.  In  order  to  reason  more  effectively 
however,  deductive  logic  and  probability  theory  are  more  definitive  than  inductive 
reasoning. 


9.1  Process-based  Logic 


Eogie  is  an  attempt  to  describe  normative  or  eorrect  reasoning.  It  includes  propositional 
logic,  predicate  (first-order)  logic,  second-order  logie  (e.g.,  situation  ealculus),  and  causal 
logic.  Eogie  depends  on  the  correctness  of  the  premise  and  the  entailment  operator  to 
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derive  a  correct  conclusion.  Any  error  in  the  assessment  of  the  premise  and  the 
entailment  results  in  an  incorrect  conclusion.  Compounding  of  premise  variables  also 
complicates  the  derivation  of  a  correct  conclusion. 

In  the  real  world,  logic  must  be  based  on  reality.  People  do  not  reason  in  a 
vacuum.  There  are  always  entities  with  properties  and  behaviors,  relationships,  and 
processes.  Without  real  contexts,  the  logical  inference  can  be  made  to  deduce  anything. 
Causal  logic,  the  part  of  the  logic,  is  a  logical  formalism  closest  to  reality.  Underlying 
causal  logic  is  descriptions  about  processes  and  mechanisms.  There  is  a  chasm  between 
Computer  Science  and  natural  sciences  like  physics.  In  physics,  researchers  focus  on 
underlying  processes  and  mechanisms.  In  Computer  Science,  researchers  focus  on  logic, 
representation,  and  algorithms  (including  control  and  vision  algorithms  in  robotics). 

A  new  kind  of  logic  provides  a  foundation  for  propositional,  first-order,  and 
second-order  logic.  This  logic  is  called  process-based  logic  or  process  logic,  as  it 
describes  processes  and  mechanisms  instead  of  just  truth  values,  predicates,  functions, 
and  causality.  This  logic  augments  the  premises,  the  entailment  operator,  and  the 
conclusions  with  process  and  entity  descriptions.  In  creating  process  logic,  processes  are 
modeled  and  then  augmented  with  semantic  information  and  ontology.  Modeling 
processes  and  entities  with  properties  and  behaviors  can  be  effectively  done  with 
simulations.  Thus  simulations  capture  the  structures  of  the  real  world  for  logical 
reasoning.  Augmented  with  ontology  and  knowledge  base  (which  is  to  say,  process 
ontology),  the  process  logic  is  reflected  in  simulation.  The  following  figure  illustrates  the 
relationships  between  conceptual  model,  implementation,  process  logic  describing 
processes/mechanisms,  causal  logic  describing  causal  relations,  and  if-then  rules 
describing  process-based  and  thus  conceptual-based  changes  to  the  model  and  parameter 
values.  The  arrows  in  the  picture  represent  the  notion  of  “is  derived  from”. 
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Figure  20,  Process  Logic  and  Its  Derivation 

Process  logic  denotes  the  change  and  the  proeesses  of  change  from  one  entity  (or 
one  entity  value)  to  another  in  the  eonceptual  model  and/or  in  the  implementation.  It 
starts  out  with  process  model.  Augmenting  the  process  model  with  symbolic  and 
semantic  information  relevant  to  the  model  produces  the  proeess  logie.  By  definition, 
proeess  logic  is  the  sequences  or  ordered  events  based  on  the  proeess  model  augmented 
by  semantic  information  and  ontology. 

Abstracting  the  proeess  logic  using  human-friendly  eausal  language  is  eausal 
relations.  Causal  relations  abstract  the  thoughtless  change  to  a  meaningful  semantics  of 
causes  and  effects.  The  if-then  rules  describe  the  adjustments  of  the  values  of  the 
variables  in  the  causal  relations  and/or  proeess  logic  based  on  empirical  data.  The  rules 
are  tied  to  the  causal  relations  and/or  proeess  logie.  The  code/implementation  ean  be  in 
the  form  of  simulations. 

As  an  example,  let  us  examine  the  proeess  model  and  the  proeess  logie  for 
smallpox.  Smallpox  has  the  incubation  period,  the  initial  symptom  (prodome)  which  lasts 
2-4  days,  early  rash  for  about  4  days,  pustular  rash  for  about  5  days,  pustules  and  scabs 
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for  about  5  days,  resolving  scabs  for  about  6  days,  and  then  finally  resolved  scabs.  The 
process  model  for  smallpox  is  illustrated  in  the  following  figure. 


Figure  21,  Process  Model  for  Smallpox 


As  shown,  smallpox  progresses  in  roughly  an  orderly  sequence  of  events. 

The  process  logic  based  on  the  process  model  can  be  written  by  using  the 
modified  N3  notation  (with  the  addition  of  the  sequence  primitive)  as  follows. 

<sequence>  <begins>  <null>  . 

<sequence>  <based  on>  <periods>  . 

<period>  <numbers>  <1>  . 

<period>  <is>  <incubation>  . 

<period>  <has  length  of>  <7  to  17  days>  . 

<incubation>  <is>  <non  contagious>  . 

<incubation>  <has  symptoms  of>  <null>  . 

<period>  <numbers>  <2>  . 

<period>  <is>  <prodome>  . 

<period>  <has  length  of>  <2  to  4  days>  . 

<prodome>  <is>  <contagious>  . 

<prodome>  <has  symptoms  of>  <fever,  malaise,  headache,  body  ache. 
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vomiting>  . 

<period>  <numbers>  <3>  . 

<period>  <is>  <early  rash>  . 

<period>  <has  length  of>  <about  4  days>  . 

<early  rash>  <is>  <most  contagious>  . 

<early  rash>  <has  symptoms  of>  <red  spots  on  the  tongue,  red  spots  in  the  mouth, 
rash  everywhere  on  the  body,  redueed  fever,  rash  beeoming  bumps, 
bumps  fdled  with  a  thiek  opaque  fluid  with  bellybutton-like  depression 
in  the  center,  fever  rising  again>  . 

<period>  <numbers>  <4>  . 

<period>  <is>  <pustular  rash>  . 

<period>  <has  length  of>  <about  5  days>  . 

<pustular  rash>  <is>  <contagious>  . 

<pustular  rash>  <has  symptoms  of>  <bumps  becoming  pustules>  . 

<period>  <numbers>  <5>  . 

<period>  <is>  <pustules  and  scabs>  . 

<period>  <has  length  of>  <about  5  days>  . 

<pustules  and  scabs>  <is>  <contagious>  . 

<pustules  and  scabs>  <has  symptoms  of>  <pustules  starting  to 
form  a  crust,  scabs>  . 

<period>  <numbers>  <6>  . 

<period>  <is>  <resolvmg  scabs>  . 

<period>  <has  length  of>  <about  6  days>  . 

<resolvmg  scabs>  <is>  <contagious>  . 

<resolvmg  scabs>  <has  symptoms  of>  <fallmg  scabs>  . 

<period>  <numbers>  <7>  . 

<period>  <is>  <resolved  scabs>  . 

<period>  <has  length  of>  <an  instant>  . 

<resolved  scabs>  <is>  <noncontagious>  . 

<resolved  scabs>  <has  symptoms  ol^  <all  scabs  gone>  . 

<sequence>  <ends>  <null>  . 
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While  not  shown  in  the  above  example,  the  sequenee  also  allows  the  speeifieation  of  the 
deeision  flow  in  the  form  of  “if-then-else”.  The  sequenee  for  the  proeess  logie  is 
implemented  as  an  ordered  traverse  in  (the  semantie  networks  of)  simulation  knowledge 
spaee  and  domain  knowledge  spaee. 


9.2  Probabilistic  Logic 


Probabilistie  logie,  the  interseetion  of  probabilistie  reasoning  and  logieal  representation, 
has  beeome  an  aetive  researeh  area  in  Artifieial  Intelligenee.  The  researeh  in  probabilistie 
logie  pursues  the  integration  of  deduetive  logie  and  probabilistie  reasoning.  The 
brittleness  of  symbolie  logie  (e.g.,  first-order  logie)  lends  to  the  ehoiee  of  statisties  - 
partieularly  Bayesian  statisties  -  to  taekle  Artifieial  Intelligenee  problems.  The  statistieal 
paradigm,  however,  has  an  inherent  weakness  of  being  unable  to  support  the  domain 
and/or  struetured  knowledge  and  the  wealth  of  inferenees  in  logie.  The  view  behind  the 
probabilistie  logie  researeh  in  Artifieial  Intelligenee  is  that  logie  and  probability  are 
enough  for  representing  the  real  world.  (A  related  subarea  ealled  probabilistie  logie 
learning  looks  at  how  to  learn  the  logieal  and  probabilistie  formalisms  and  values  using 
maehine  learning.) 

WIZER  points  to  what  is  missing  in  this  view:  the  importanee  of  modeling  and 
simulation,  the  need  to  foeus  on  natural  proeesses  instead  of  just  pure  logie,  and  the 
signifieanee  of  hypothesis  formation  and  testing.  Augmented  by  eausal,  proeess,  and 
ontologieal  reasoning,  WIZER  supplies  knowledge  strueture  for  statisties  through 
simulation  models  and  validated  simulations.  It  provides  robustness  for  logieal  reasoning 
in  the  form  of  statistieal  ealeulations  eonstrained  by  simulations  (after  simulation 
validation  with  empirieal  data). 
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9.3  Logic,  Probability,  and  Structure  of  the  World 


The  majority  of  work  in  Artificial  Intelligence  focuses  on  devising  smart  representations 
and  algorithms  to  mimic  part  of  human  intelligence.  Simulations  are  not  considered  an 
essential  part  of  this  endeavor.  Simulations  have  great  successes  in  mimicking  complex 
systems.  Consequently,  simulation  -  and  simulation  modeling  -  is  a  great  way  to 
represent  systems.  Expert  systems,  while  being  part  of  Artificial  Intelligence,  are  also 
researched  separately  from  simulations. 

Artificial  Intelligence  research  went  through  several  phases  throughout  several 
decades:  symbolic  logic  phase  in  the  70s  and  80s,  connectionist  phase  in  the  90s,  genetic 
algorithm  phase  in  the  90s,  and  probabilistic/statistical  phase  during  this  decade  (the 
2010s).  As  the  time  of  this  writing  however,  there  is  a  revival  of  the  trend  toward 
knowledge-based  methods  especially  for  the  Semantic  Web. 

Current  work  in  logic  is  addressing  problems  such  as  the  brittleness  of  first-order 
symbolic  logic.  Recent  statistical/probabilistic  phase  -  especially  Bayesian  statistics  -  is 
the  evident  of  the  unfulfilled  promise  of  symbolic  logic.  The  failed  Japanese  Fifth 
Generation  Computer  Systems  project  and  the  lukewarm  Cyc  project  illustrated  the 
difficulty  of  scaling  up  symbolic  logic  and  of  making  logic  not  brittle.  Probability  and 
statistics  however  cannot  handle  well  the  structures  of  knowledge  and  the  inferences  of 
logic. 

Logic,  while  powerful,  derives  its  power  from  accurate  representations  of  the 
world.  As  an  example,  while  biologically  a  cat  is  a  mammal,  the  correct  first-order  logic 
declaration  in  the  context  of  society  is  that  a  cat  is  a  pet.  Statistics,  while  powerful  and 
robust,  does  not  form  an  accurate  representation  of  the  world  and  cannot  handle  symbolic 
information  well.  The  structure  of  the  world  and  the  structure  of  the  knowledge  about  the 
world  cannot  be  represented  by  statistics.  For  this,  we  need  simulations.  Modeling  and 
simulation  can  mimic  the  real  world  closely.  It  can  mimic  complex  processes.  This 
indicates  that  to  be  successful  in  achieving  real-world  logical  reasoning,  it  is  necessary  to 
have  simulation  as  an  essential  component  in  addition  to  logic  and  statistics.  The 
empirical  view  of  the  world  suggests  that  it  is  the  -  empirical  -  process  that  is 
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fundamental,  rather  than  logie.  Validated  simulations  mimie  real  world  proeesses. 
WIZER  thus  faeilitates  the  eonneetion  between  statisties  and  logie  through  validated 
simulations. 

Instead  of  logieal  reasoning,  the  simple  but  profound  seientifie  proeess  of 
hypothesis  building  and  testing  -  the  seientifie  method  -  is  fundamental.  While  logie  is 
utilized  in  hypothesis  building,  knowledge  aeeumulation  of  seienee  is  aehieved  by 
earefully  eonstrueting  and  testing  hypotheses.  If  logie  is  used  without  the  empirieal  eheek 
of  hypothesis  testing,  the  inferenee  may  look  valid  but  it  is  empirieally  wrong.  Both  the 
premise  and  the  inferenee  rule  must  be  empirieally  eorreet  to  allow  empirieally  valid 
inferenee.  Logie  also  depends  on  propositions  being  true  or  false.  Attempts  at  multi-value 
logie  and  fuzzy  logie  have  not  produeed  sound  reasoning  formalisms.  Here  WIZER  also 
faeilitates  the  eonstruetion  of  hypotheses  and  testing  of  hypotheses  in  simulations  as  a 
proxy  to  the  real  world.  It  provides  an  empirieal  foundation  through  validated  simulations 
on  whieh  logieal  reasoning  is  based. 


9.4  Empirical  Path  toward  Artificial  Intelligence 


The  field  of  Artifieial  Intelligenee  has  attempted  to  mimie  human  intelligenee  for  at  least 
five  deeades.  The  approaehes  to  aehieve  artifieial  intelligenee  inelude  logieal, 
eonneetionist,  and  statistieal  (Bayesian)  approaehes.  Outside  seientifie  diseovery, 
however,  little  attention  is  paid  to  the  faet  that  human  seientists  gather  knowledge  by 
hypothesis  generation  and  testing,  whieh  is  to  say,  by  the  seientifie  method.  Without  the 
eoneept  of  falsifiable  and  testable  hypotheses  of  the  seientifie  method,  the  aequisition  of 
new  knowledge  has  been  slow  and  error-prone.  Seienee  foeuses  on  elueidating  entities 
and  proeesses/meehanisms,  not  just  logieal  entailments.  Thus,  it  may  make  sense  to  foeus 
on  proeesses/meehanisms  to  aehieve  artifieial  intelligenee.  I  eall  this  the  empirieal  path 
toward  artifieial  intelligenee.  Simulation  is  one  of  the  most  appropriate  tools  to  mimie 
proeesses/meehanisms  (the  other  being  mathematies).  It  may  take  validated  simulations 
with  the  eapability  of  building  and  testing  hypothesis  for  simulation  model  improvement 
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to  achieve  artifieial  intelligenee.  While  logie  can  represent  other  formalisms,  simulations 
have  the  virtue  of  being  able  to  add  robustness  through  its  statistieal  eomputations  tied  to 
the  simulation  model. 

We  live  in  the  era  of  data  rieh  and  knowledge/inference  poor  in  many  scientific 
fields,  espeeially  in  eeonomies,  business,  and  bioinformatics/eomputational  biology.  Data 
are  inexpensive.  From  data,  eausal  model  ean  be  eonstrueted  by  eausal  learning/diseovery 
algorithms.  Simulation/proeess  models  ean  be  improved  by  hypothesis  building  and 
testing. 
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9.5  Summary 


This  chapter  shows  that  validated  simulations,  the  result  of  WIZER,  can  function  as  the 
eonneetor  between  statistics  and  logic  as  validated  simulations  represent  the  struetures  of 
the  real  world  elosely  and  add  robustness  to  logieal  reasoning  through  the  statistieal 
eomputations  tied  to  the  simulation  model.  Struetured  knowledge  and  statisties  ean  be 
eaptured  in  simulations  and  be  made  operable.  This  allows  robust  logieal  reasoning 
(ineluding  eausal  and  proeess  reasoning).  As  this  era  is  blessed  with  rieh  data,  high- 
fidelity  simulations  are  feasible  (validated  with  rieh  data  and  knowledge).  Using  maehine 
learning  and  data  mining  techniques,  knowledge  ean  be  learned  and/or  extracted  from 
data. 
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Chapter  X:  Causality,  Simulation,  and 
WIZER 


Causality  is  an  important  concept  for  humans  and  other  living  beings.  Whether  the  real 
world  is  causal  is  debatable  (quantum  mechanics  is  an  excellent  example  of  non¬ 
causality).  Underlying  causality  are  physieal  proeesses  and  meehanisms.  In  physics,  the 
fundamental  laws  of  nature  are  expressed  in  continuous  systems  of  partial  differential 
equations.  Yet  the  words  and  coneepts  that  are  used  to  talk  and  reason  about  causes  and 
effects  are  expressed  in  discrete  terms  that  have  no  direct  relationship  to  theories  of 
physios.  This  chapter  desoribes  the  state  of  the  art  of  oausal  modeling.  It  advanoes 
validated  simulations  through  WIZER  as  a  better  method  to  do  oausal  modeling, 
inference,  and  analysis. 


10.1  Causal  Modeling  and  Analysis 


Causality  is  an  approximation  of  orderliness  in  the  macro-level  universe  even  though  the 
mioro-level  universe  underpinning  it  is  a  causation-defying  quantum  universe.  Squirrels 
bury  nuts  for  the  winter.  People  plan  daily  trips  to  work  or  shop.  The  suooess  of  these 
activities  does  not  direotly  depend  on  theories  of  physics,  but  it  indicates  that  the  world  is 
sufficiently  orderly  that  a  rough  rule  of  thumb  can  be  a  useful  guide.  Causal  relations 
represent  one  of  such  rule  of  thumb. 

Being  able  to  make  causal  predictions  about  the  world  is  beneficial,  so  much  so 
that  causality  has  become  an  integral  part  of  human  worldview  and  language.  Causal 
relationships  are  even  sometimes  assumed  as  faets  without  any  conseious  thought.  People 
form  causal  relationships  based  on  perception  or  estimation  of  order  or  regularity  in  the 
random  world.  Causal  relationships  are  not  without  pitfalls.  People  believe  in  many 
spurious  causal  relationships  and  the  effeet  is  considerable.  Empirical  elucidating  of 
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processes  or  meehanisms  behind  a  eausal  relationship  is  needed  to  aseertain  its 
eorreetness.  In  addition  to  eausal  reasoning,  proeess-based,  and  empirieal  reasoning  is 
erueial. 

Causal  relationships  are  modeled  by  direeted  graphs  (Greenland  and  Pearl  2006). 
Causal  models  have  been  known  as  struetural-equations  models  (Kline  2004)  in 
eeonomies,  behavioral  seienee,  and  soeial  seienees,  whieh  are  used  for  effeet  analysis. 
The  eausal  diagrams  in  form  of  direeted  graphs  depiet  eausal  relationships.  The  following 
figure  shows  an  example  of  eausal  diagrams.  The  arrow  denotes  the  eausal  dependeney. 

A  B 


Figure  22.  Simple  Causal  Diagram 

As  shown,  A  and  B  are  independenee,  while  C  is  direetly  dependent  on  B.  E  is  direetly 
dependent  on  C  and  B.  D  is  direetly  dependent  on  C.  E  is  indireetly  dependent  on  A.  The 
eausal  relations  depieted  above  are  assumed  to  be  deterministie.  But  then  the  eausal 
diagrams  sueh  as  the  above  ean  be  reinterpreted  formally  as  probabilistie  models  or 
Bayesian  network  models  to  aeeount  for  uneertainty.  This  is  the  first  major  advanee  of 
eausal  inferenee:  from  deterministie  eausality  to  probabilistie  eausality.  The  eausal 
diagrams  ean  further  be  reinterpreted  as  a  formal  tool  for  eausal  inferenee.  This 
represents  the  seeond  major  advanee  of  eausal  inferenee;  from  deseriptive  diagram  of 
eausality  to  aetually  use  the  diagram  as  a  means  to  do  eausal  reasoning. 

Causal  diagrams  are  assumed  to  be  Markovian.  Causal  analysis,  whieh  deals  with 
what  inferenee  one  ean  draw  from  several  eausal  statements,  is  based  on  direeted  graph, 
the  notion  of  d-separation,  and  Markovian  assumption  (Pearl  2000).  Causal  analysis 
makes  the  initial  assumptions  of  whieh  variables  are  endogenous  (to  be  examined  in 
eausal  reasoning)  and  whieh  ones  are  exogenous  (to  be  assumed  away  as  the  environment 
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or  noise).  Causal  relations  ean  be  extraeted  from  data  by  using  eausal  Bayesian  networks 
learning. 


10.2  Causality  and  Process 


Causal  relations  are  eonstructed  by  humans  to  estimate  some  kind  of  order  from  physical 
processes.  They  are  sometimes  wrongly  constructed.  For  example,  it  was  wrongly 
believed  that  severe  illness  is  caused  by  depression  and/or  anger.  Without  clear 
underlying  mechanisms  or  processes,  causality  can  still  be  useful  (e.g.,  if  causes  of 
certain  diseases  are  known  but  not  the  disease  mechanisms  inside  a  human  body,  a 
remedy  can  still  be  given  by  addressing  the  causes)  but  is  risky.  It  is  better,  of  course,  if 
the  underlying  processes  are  elucidated.  If  the  underlying  processes  are  clear,  causality  is 
still  needed  to  facilitate  human  understanding  and  use.  This  is  similar  to  what  higher- 
level  computer  language  does,  which  is  encapsulating  the  machine-level  binary  code. 


10.3  Causality  and  WIZER 


Instead  of  relying  on  directed  graphs,  Bayesian  networks,  and  Markovian  assumption  to 
elucidate  causality,  WIZER  utilizes  validated  simulations.  Bayesian  networks  used  to 
model  causality  in  the  form  of  causal  Bayesian  networks  fundamentally  suffer  from  the 
prior  specification  problem,  the  conditional  dependence  correlations,  the  inability  to  take 
into  account  the  excluded  middle,  the  disconnect  with  what  human  scientists  normally  do 
in  their  scientific  work,  the  lack  of  knowledge  and  ontological  inference,  and  the 
requirement  for  large  enough  samples  to  be  meaningful.  Validated  simulations  can  depict 
more  accurately  the  many  variables  and  their  potential  interactions  that  could  compound 
causal  and/or  Bayesian  reasoning.  They  are  also  able  to  model  individual-based 
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causations  and  see  the  eumulative  effeets  (or  the  emergenee)  on  the  sample  populations. 
Validated  simulations  represent  real  world  proeesses.  Causality  ean  be  thought  of  as  a 
simple  seareh  for  regularity  in  the  real  world  proeesses,  resulting  in  an  approximation  or 
a  simple  rule  of  eause-and-effect  “regularity”.  WIZER  allows  the  grounding  of  eausal 
relationships  on  proeesses  and  meehanisms  as  emulated  by  the  validated  simulation  and 
on  empirieal  data.  As  all  eausal  relations  are  empirieal,  this  eapability  of  grounding 
inferred  or  eoneeptual  eause-effeet  relations  is  important.  The  following  table  shows  the 
eomparison  between  graph-based  and  validated-simulation  eausality  representation. 


Table  15.  Causa 

lity  by  Graph  versus  by  Validated-Simulation 

Grapb-based 

V  alidated-Simulation- 
based 

Causal  relation 

representation 

An  edge  in  the  graph 

Simulated  proeesses 

underlying  the  causal 

relation 

Uneertainty  assessment 

Conditional  probability  with 
Markovian  assumption 

Detailed  proeess  simulation 

Allow  symbolic 

information? 

No 

Yes 

Structured  knowledge  taken 
into  eonsideration,  other 
than  the  eausal  struetures 

Not  in  the  probability 
assessment  of  a  eausal 
relation 

Yes,  ineluding  in  the 
assessment  of  a  eausal 
relation 

Abstract  away  minor 

factors? 

Yes 

Yes,  but  mueh  less  so 

Knowledge  inferenee? 

No 

Yes 

Realism/believability? 

Not  good 

Good 

Exception  handling 

Diffieult 

Incorporated 

Individual  to  population 
eausality  “emergence” 

Cannot  be  modeled 

Modeled  in  detail 

Determination  of  exogenous 
factors 

Determined  a  priori 

All  faetors  (as  many  as 
feasible)  modeled  and  the 
exogenous  faetors  are 

shown  as  having  the 
minimal  or  no  impaets  to  the 
causal  relation 
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10.4  Summary 


This  chapter  talks  about  causality  and  its  graph-based  modeling.  It  also  talks  how 
validated  simulations  and  WIZER  can  supply  better  fidelity  causal  relations  than  causal 
analysis  using  directed  graph  alone. 
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Chapter  XI:  Potential  Extensions  and 
Implications  of  WIZER 


This  chapter  talks  about  the  potential  extensions  of  WIZER.  By  potential  extensions  I 
mean  the  technological  and  conceptual  extensions.  The  latter  part  of  this  chapter  talks 
about  the  implications  and  applications  of  WIZER  in  diverse  fields. 


11.1  Toward  a  Simulation  and  Knowledge  Web 


The  Semantic  Web  (Davies  et  al.  2003)  is  currently  the  next  generation  web.  Unlike  the 
current  World  Wide  Web,  the  information  in  the  Semantie  Web  is  engineered  in  such  a 
way  to  be  easily  processed  by  computers  on  a  global  scale. 

As  validated  simulations  and  their  semantic  descriptions  are  made  feasible  by 
WIZER,  it  is  now  possible  to  use  the  semantie  deseriptions  -  and  some  additional 
resource-allocation  ontology  -  to  create  a  Simulation  Web.  Instead  of  focusing  on  the 
structures  of  knowledge,  the  Simulation  Web  allows  the  organic  real  world  dynamics  to 
be  captured.  As  validated  simulations  imply  validation  knowledge,  the  Simulation  Web 
produces  the  Knowledge  Web.  The  Simulation  Web  and  the  Knowledge  Web  should  be 
able  to; 

1.  Ground  any  ontology  or  semantics  on  validated  simulations  based  on  empirical 
data.  Ontological  engineering  deals  with  the  issue  of  ontology  construction  and 
conflicts  in  ontologies.  What  ontology  really  means  can  be  made  empirieal  by 
validated  simulations.  This  faeilitates  the  resolution  of  ontological  conflicts  and 
provides  an  essential  context  and  foundation  on  which  ontologies  are  built  on. 

2.  Examine  any  data  critically  through  validated  simulations. 

3.  Intelligently  extraet  knowledge  from  validated  simulations. 

4.  Distribute  simulation  tasks  over  the  Internet  based  on  semantics  or  ontology. 
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5.  Perform  not  only  logical  inference  but  process-based  and  empirical-based 
inference. 

6.  Produce  in-depth  knowledge  or  knowledge  grounded  in  empirical  reality. 

The  modified  N3  notation  adopted  for  Simulation  Description  Logic  of  WIZER 
incidentally  shows  it  is  not  conceptually  difficult  to  interface  simulations  with  the 
Internet.  The  simulation  only  needs  to  be  ontologically  described  with  appropriate 
knowledge  bases  and  inference  mechanisms.  Once  the  ontology  is  tied  with  the 
simulation,  the  N3-like  description  of  simulations  and  of  simulation  results  can  be  shared 
through  the  Internet.  More  sophisticated  simulation  sharing  includes  distributing 
simulations  by  their  components  throughout  the  Internet.  This  would  turn  the  Internet 
into  one  hypercomputer.  The  distribution  of  simulations  is  more  appropriate  for  social 
systems  where  components  are  relatively  loosely  coupled  than  for  fluid  mechanics,  for 
example.  This  is  because  the  Internet  connections  incur  delays  which  are  substantial  for 
vector  or  tightly-coupled  applications.  Issues  of  access  rights,  privacy,  load-balancing, 
and  others  form  intriguing  research  subjects. 


11.2  Component-based  Multi-scale  Super-simulations 


The  integrated  circuits  and  the  automobile  are  the  epitome  of  the  success  of  component- 
based  system  building.  Similar  approaches  may  prove  to  be  fruitful  for  building  realistic 
simulations  of  many  systems.  Additionally,  multi-scale  components  combine  components 
from  various  physical  scales  (e.g.,  diabetes  expression  simulation  with  public  health 
simulation).  Related  to  this  component-based  simulation  building  is  docking,  which 
validates  a  simulation  with  another  previously  validated  simulation. 

In  the  modeling  and  simulation  field,  this  composable  system  of  systems  approach 
for  interoperability  of  diverse  simulators  is  called  Federated  Simulation  Systems. 
Federated  Simulation  Systems  have  the  following  concepts: 

•  A  federation  comprising  of  a  collection  of  simulators  (federates). 
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•  Time-stamped  event-based  interaetions  between  federates. 

•  Standardization  for  eommon  objeets  and  events. 

•  Sealability  via  parallel  and  distributed  simulation  teehniques. 

Current  bottlenecks  in  integrating  many  simulation  systems  and  in  using 
simulations  as  components  lie  in  the  difficulty  of  getting  the  semantics  and  assumptions 
of  the  components  to  match.  This  spurred  the  work  on  simulation  interchange/format 
standardization.  Time-stamped  events  provide  a  primitive  way  for  interactions  between 
federates.  The  difficulty  of  getting  the  semantics  right  is  partly  caused  by  not  making  all 
assumptions  explicit  and  operable.  In  addition,  the  results  have  not  been  put  in  consistent 
knowledge  bases  that  could  be  automatically  reasoned  with.  WIZER  can  remedy  the 
above  two  issues.  It  can  also  provide  a  more  sophisticated  interaction  method  for 
federates  in  lieu  of  the  time-stamped  events.  The  simulations  or  simulation  components 
will  all  have  symbolic  or  semantic  descriptions  of  them.  WIZER  can  pave  a  way  to  the 
realization  of  component-based  multi-scale  super-simulations.  If  these  super-simulations 
are  valid  and  detailed  enough,  they  may  form  the  foundation  for  software  and  robotic 
systems  that  understand  the  real  world.  Needless  to  say,  these  systems  will  have  immense 
utility. 


11.3  WIZER  and  Knowledge  Assistant 


As  WIZER  facilitates  validated  simulations,  the  knowledge  behind  the  simulations 
becomes  clearer  and  more  interactive  for  human  users.  Currently,  when  someone  tries  to 
find  a  specialized  knowledge  to  understand  what  kind  of  materials  he/she  should  choose 
to  use  for  building  his/her  home,  for  example,  he/she  is  forced  to  delve  into  technical 
papers  describing  the  materials  if  he/she  refuses  to  be  guided  by  the  commercial  and 
advertising  information  alone.  Reading  and  understanding  technical  papers  written  in 
technical  terms  for  professionals  is  hard.  Here  WIZER,  with  its  validated  simulations, 
comes  to  help.  Instead  of  simple  technical  papers  and  commercial  information, 
“knowledge  startups”  will  build  validated  simulations  complete  with  symbolic 
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(knowledge-based)  and  graphieal  interfaees  for  the  end  users.  These  validated  simulators 
will  take  the  form  of  software  packages  much  like  tax-preparation  software  today.  In  the 
future,  when  someone  tries  to  find  how  best  to  build  a  home,  he/she  will  purchase  this 
knowledge-assistant  package  and  use  its  intertwined  validation  simulation  and  knowledge 
inference  for  speedier  understanding  of  difficult  subjects.  WIZER  provides  the 
foundation  for  such  knowledge  assistants. 

Validated  simulations  can  also  be  used  as  a  means  to  communicate,  augmenting 
video  and  human  speech.  Effective  communication  depends  a  lot  on  the  context. 
Validated  simulation  can  capture  such  a  context.  Today,  communication  is  limited  by 
language  and  cultural  barriers.  If  one  wants  to  communicate  what  it  is  like  to  be  living  in 
the  real  and  current  Costa  Rica,  for  example,  one  can  get  some  rough  sense  of  it  by 
reading  (here  is  the  language  barrier),  talking  to  people  (language  and  contextual 
barriers),  seeing  pictures  (limited  knowledge-based  explanation  for  them),  or  watching 
tourist  movies  (this  kind  of  movie  is  limited  and  movies  are  non-interactive  or  have 
limited  interactivity  -  changing  several  scenarios  at  most).  One,  however,  cannot  tailor 
the  movies  to  his/her  specific  circumstance  nor  can  one  really  get  the  feeling  of  living  in 
Costa  Rica.  Simulations  have  been  used  for  combat  training  purposes  to  give  trainees  the 
feel  of  various  combat  situations.  WIZER  through  validated  simulations  enables  more 
effective  and  detailed  communication  among  people  and  across  cultures. 


11.4  WIZER  and  Ontological  Engineering 


People  construct  different  and  often  conflicting  ontologies,  partly  due  the  fact  that  the 
ontology  does  not  exist  in  a  vacuum  (it  is  constrained  by  social  and  cultural  contexts,  for 
example).  One  way  to  fix  this  is  to  have  empirical  grounding  of  the  defined  meanings  in 
ontologies.  WIZER  provides  this  empirical  grounding  through  validated  simulations. 
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11.5  WIZER  and  Policy  Analysis 


Policy  Analysis  uses  numerous  eomputational  models,  partieularly  eeonomie  models. 
Most  polieies  and  their  driving  polities  are  now  governed  using  human  languages,  whieh 
are  inadequate  for  objeetive,  transparent,  and  aeeurate  diseourse  and  analysis,  as  the 
languages  eontain  ambiguities  and  are  loaded  with  historieal,  eultural,  and  emotional 
elements.  This  is  not  to  say  that  historieal,  eultural,  and  emotional  elements  are  not 
important,  but  they  need  to  be  explieitly  noted  to  faeilitate  elear  reasoning  and 
understanding.  The  law  witnesses  the  tailoring  or  formalization  of  a  portion  of  human 
languages  to  try  to  eliminate  ambiguities  and  misunderstanding,  but  it  requires 
professionals  to  interpret  them  thus  still  leaving  room  for  ambiguities,  misunderstanding, 
and  misapplieation. 

Imagine  people  being  able  to  diseem  the  polieies  and  laws  through  realistie 
movie-like  simulations  based  on  validated  models.  If  we  read  through  the  396-page  US 
bird  flu  plan,  we  are  left  with  a  sense  of  a  good  plan  with  nothing  to  worry  about,  but  no 
elear  idea  of  what  would  really  transpire,  espeeially  on  the  all  important  questions  of 
“What  will  happen  to  my  family  and  me?  How,  where,  when,  from  whom  exaetly  eould 
we  get  help?”  Imagine  people  being  able  to  walk  through  and  play  around  with  the  bird 
flu  plan  just  like  playing  games.  This  is  possible  through  validated  models  and 
simulations,  whieh  WIZER  faeilitates. 

Human-language  plans  leave  too  mueh  uneertainty  and  ambiguity;  both  of  whieh 
are  fundamentally  detrimental  to  the  sueeess  of  plans,  espeeially  ones  whose  sueeess 
depends  on  individual  behaviors.  Plan  writers  eonseiously  or  uneonseiously  ineur  a 
positive-image  bias  in  the  plan.  Imagine  authorities  providing  people  with  not  just  written 
plans,  but  validated  simulators.  Besides,  nobody  wants,  has  time,  or  is  able  to  read 
through  the  hundreds  or  thousands  of  pages  of  doeuments,  but  almost  everybody  likes  to 
wateh  movies  and  play  games.  Validated  simulations  thus  provide  a  more  natural  user 
interfaee  (eombined  with  3D  movie  interaetive  presentation)  to  understand,  analyze,  and 
design  polieies  and  regulations. 

The  messy  response  to  Hurrieane  Katrina  in  2005  indieates  that  all  the  written 
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texts  on  policies  and  regulations  have  never  been  validated  (to  see  how  all  work  with 
each  other,  for  example).  Validated  simulations  of  all  the  policies  and  regulations  in  the 
context  of  a  disaster  would  have  made  clear  all  the  deficiencies.  Thus  WIZER  facilitates 
the  improvement  of  regulations,  policies,  and  legislations  through  validated  simulations. 


11.6  Localization  and  Instantiation  of  Large 
Simulations  for  Decision  Making 


In  large  simulations  such  as  BioWar,  the  simulations  are  constructed  with  a  general  set  of 
parameters.  They  are  developed  with  one  or  two  test  cases.  In  BioWar,  for  example,  the 
simulation  is  developed  with  respect  to  five  seed  cities.  By  instantiation  and  localization, 
I  mean  the  deployment  of  simulation  to  other  cases:  in  case  of  BioWar,  to  other  cities. 
WIZER  can  facilitate  the  parameter  adjustments  and  the  validation  of  simulations  to 
instantiate  and  localize  the  simulations. 


11.7  WIZER  for  Organization  and  Management 


The  way  companies  and  societal  systems  are  currently  managed  is  based  on  case  studies 
and  management  lessons  based  on  human  languages,  with  only  necessitated  support  of 
computational  tools.  With  the  advent  of  Computational  Organizational  Theory  and 
Computational  Management  Science,  almost  every  aspect  of  organizations  and 
management  can  now  be  modeled  computationally  and  inferentially.  Eor  example,  the 
management  knowledge  can  be  computationally  and  inferentially  modeled.  Business 
process  design,  operations  management,  and  decision  making  do  not  happen  in  a  vacuum, 
but  within  a  context  of  organizational,  legal,  media,  financial,  societal,  and  technological 
background.  In  this  era  of  globalization,  electronic-commerce,  and  mobile-commerce,  the 
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background  becomes  mueh  more  a  determinant  of  suceess  for  any  business  and 
management  plan. 

Organizational  modeling  and  simulation  is  mostly  quantitative.  To  improve  upon 
the  quantitative  organizational  modeling  and  simulation,  WIZER  eontributes  (symbolic) 
knowledge  inferenee  and  validated  simulation  to  the  organizational  modeling  and 
simulation.  Closely  related  to  organizations  are  networks,  ineluding  soeial  networks,  of 
whieh  WIZER  eould  faeilitate  the  validation  too. 

On  the  other  hand,  knowledge  management  foeuses  exelusively  on  ontology  and 
knowledge  bases.  Here  WIZER  eontributes  validated  simulations  to  ground 
business/management  rules  on  empirical  data.  Knowledge  management  ineludes  the 
management  of  knowledge  eapital.  WIZER  faeilitates  knowledge  management  by  the 
nature  of  its  ability  to  handle  symbols,  numbers,  and  simulations. 

As  an  organization  is  a  knowledge  entity,  foeusing  on  the  nature,  strueture,  and 
dynamies  of  knowledge  in  organization  may  shed  light  on  organization  performanee 
problems.  WIZER  ean  assist  in  analyzing  organization  performance  by  looking  into  what 
knowledge  resides  where  and  how  it  is  transformed  and  exehanged  in  organization, 
instead  of  just  looking  at  the  organizational  structures,  tasks,  leadership,  etc.  The  ease  of 
Enron  is  a  good  example.  Enron  has  the  same  organizational  structure  and  tasks  as  many 
other  eompanies.  Even  the  aeeounting  seems  to  be  similar  to  other  organizations  in  terms 
of  the  system  and  the  numbers.  Only  by  earefully  examining  what  is  unusual  about  the 
knowledge  in  Enron  and  about  Enron  ean  one  ascertain  whether  Enron  is  a  eompany  in  a 
good  standing  or  not.  As  an  example,  the  knowledge  about  the  multiplieation  of  Speeial 
Purpose  Vehieles  should  have  triggered  an  alert  among  analysts. 


11.8  WIZER  and  Biomedical  Informatics 


Biomedieal  informatics  deals  with  all  aspeets  of  understanding  and  developing  the 
effeetive  organization,  analysis,  management,  and  use  of  information  in  health  eare. 
Hospital  organization  and  eare  administration  is  eomplex,  so  mueh  so  that  it  is  eurrently 
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labor-intensive.  While  using  standard  protoeols  has  its  merits,  in  some  eases  they  break 
down.  Validated  simulations  ean  provide  insights  and  possibly  remedies  to  problems  in 
the  organization  and  management  of  eare.  Here  WIZER  faeilitates  the  validation  of 
simulations.  It  improves  the  eonfidenee  in  the  use  of  simulations,  the  ease  with  which 
simulations  are  validated  and  improved,  and  the  ease  with  which  simulation,  model,  and 
domain/empirical  knowledge  are  managed. 

Particularly  urgent  in  biomedical  informatics  is  finding  a  solution  to  the  pervasive 
and  persistent  problem  of  medical  errors.  While  training  and  use  of  standard  protocols 
help,  they  are  insufficient  as  medical  errors  still  occur  with  a  significant  frequency.  This 
dissertation  shows  an  alternative  way  to  address  medical  errors:  by  using  validated 
simulations  for  systems  of  interest.  The  Agency  for  Healthcare  Research  and  Quality 
(AHRQ)  states  that  the  single  most  important  way  to  prevent  errors  is  for  the  patient  to 
become  an  active  member  of  his/her  health  care  team.  This  is  a  good  advice,  provided 
that  the  patient  is  knowledgeable  and  not  gullible.  A  patient  experience  of  having  learned 
much  information  about  a  sports  surgery  before  deciding  whether  or  not  to  have  one 
demonstrates  that  confusion  still  ruled  and  in  the  end  the  decision  was  made  by  weighting 
the  factors  such  as  the  strength  of  a  doctor’s  persuasion,  trust,  and  a  doctor’s  reputation. 
There  were  no  clear  reasoning  steps  before  the  decision;  a  simple  random  leap  of  faith 
might  have  played  a  big  role.  The  surgery  decision  was  partially  informed,  but  to  say  it 
was  an  informed  decision  is  an  overstatement.  Closely  related  to  informed  decision  is 
informed  consent.  Validated  simulations  through  WIZER  with  the  corresponding 
knowledge  bases  and  ontology  can  assist  the  patient  to  be  knowledgeable  and  capable  to 
make  informed  decision.  More  sophisticated  way  is  to  have  a  replica  procedure  using 
validated  simulations.  A  departure  in  an  actual  procedure  from  the  validated-simulation 
procedure  should  trigger  a  question  or  an  alarm.  A  replica  hospital  in  its  entirety  by 
validated  simulations  facilitated  by  WIZER  is  also  possible. 
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11.9  WIZER  and  Bioinformatics/Computational 
Biology/Systems  Biology 


Recent  advances  in  bioinformatics  (Keedwell  and  Narayanan  2005),  computational 
biology  (Haubold  and  Wiehe  2006,  Fall  et  al.  2005),  and  systems  biology  (Szallasi  et  al. 
2006)  open  up  exciting  collaborative  efforts  intersecting  biology,  medical  science,  and 
computer  science.  Biology  is  an  experimentally  driven  science  as  evolutionary  processes 
are  not  understood  well  enough  to  allow  theoretical  inferences  like  what  is  done  in 
physics.  Quantitatively  the  biological  systems  are  extremely  challenging  as  they  have 
large  range  of  spatial  and  temporal  scales,  wide  range  of  sensitivities  to  perturbations, 
incomplete  evolutionary  records,  multiple  functionalities,  multiple  levels  of  signal 
processing,  and  no  separation  between  responses  to  external  stimuli  versus  internal 
programs. 

The  computational  challenge  in  bioinformatics,  systems  biology,  and 
computational  biology  is  immense;  the  complexity  of  biological  systems  includes  the 
molecular  underpinnings,  the  data  from  experimental  investigations  need  extensive 
quantitative  analysis,  and  it  is  not  computationally  feasible  to  analyze  the  data  without 
incorporating  all  knowledge  about  the  biology  in  question.  This  reinforces  the  sense  that 
knowledge-based  approach  is  needed  to  tame  the  computational  complexity.  Synergistic 
use  of  experimental  data,  computation,  and  domain  knowledge  is  essential. 

For  simulations  to  be  useful,  they  need  to  be  validated.  Conventionally,  validation 
is  done  with  minimal  computational  help.  A  recent  successful  simulation  model  is 
Archimedes,  a  diabetes  model,  which  was  validated  semi-manually.  WIZER  can  play  a 
small  part  in  bioinformatics/systems  biology/computational  biology  by  facilitating 
validation  and  knowledge  management  of  biological  simulations.  A  knowledge-based 
and  ontological  approach  as  implemented  in  WIZER  can  reduce  the  amount  of  search  and 
the  computational  complexity  in  biological  simulations. 
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11.10  Summary 


This  chapter  talks  about  the  potential  extensions  of  WIZER  to  realize  super-simulations, 
Simulation/Knowledge  Web,  and  others.  It  also  talks  about  the  applieations  of  WIZER  on 
poliey  analysis,  knowledge  management  and  organization  modeling,  biomedical 
informatics,  and  others. 


193 


Chapter  XII:  WIZER  Implementation 
and  User  Guide 


This  chapter  describes  the  implementation  of  WIZER,  provides  information  on 
knowledge  and  ontology  preparation  and  a  guide  for  the  use  of  WIZER. 


12.1  Code  Structure 


WIZER  is  implemented  in  C++,  primarily  because  that  it  is  intended  to  be  runnable  on  a 
supercomputer.  It  does  not  yet  have  a  shell  similar  to  expert  system  shells.  The  planned 
shell  will  include  both  the  inference  and  the  simulation  access.  Based  on  the  CEIPS 
model,  an  expert  system  shell  coded  in  C,  it  should  be  feasible  to  structure  this  shell  to  be 
runnable  on  a  supercomputer. 

The  C++  code  for  WIZER  follows  the  structure  of  a  forward-chaining  production 
system.  Variables  are  encoded  in  a  C++  structure,  rules  are  implemented  in  another  C++ 
structure  with  clauses  containing  nodes  having  the  structure  for  variables. 

As  is  currently  implemented.  Alert  WIZER  and  the  WIZER  Inference  Engine  are 
separate  programs.  They  can  be  linked,  but  Alert  WIZER  and  the  WIZER  Inference 
Engine  are  intended  to  be  usable  in  their  own  right. 


12.2  Knowledge  Configuration  for  WIZER 


As  a  knowledge-based  and  ontological  reasoning  system,  WIZER  needs  careful 
preparation  of  its  knowledge  bases  and  ontology.  The  inference  mechanism,  in  the  form 
of  forward-chaining  production  system,  is  in  place  inside  WIZER,  as  well  as  the 


http://www.ghg.net/clips/CLIPS.html 
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mechanism  for  conflict  resolution.  Knowledge,  however,  needs  to  be  input  into  WIZER 
to  allow  useful  inference  and  conflict  resolution.  Without  proper  knowledge,  WIZER's 
performance  degenerates.  In  this  Appendix,  I  outline  the  steps  to  prepare  knowledge  and 
use  WIZER. 

Steps  to  prepare  knowledge  in  the  form  of  ontology  and  knowledge  bases  for 
WIZER  include; 

1 .  Take  or  create  the  conceptual  model  of  the  simulation. 

2.  Acquire  the  conceptual  and  causal  models  of  the  domain  knowledge,  that  is  to 
say,  the  empirical  knowledge  for  validation  and  model-improvement.  Also 
acquire  the  empirical  data. 

3.  Create  the  abstract  causal  model  from  the  conceptual  model.  This  abstract  causal 
model  defines  which  variable  influences  another  variable.  (This  abstract  causal 
model  can  be  thought  of  as  the  influence  model,  but  I  use  the  term  causal  model 
to  emphasize  causality.) 

4.  Create  the  concrete  causal  model  from  the  abstract  causal  model.  This  concrete 
causal  model  represents  how  a  variable  with  a  value  causes  another  variable 
having  another  value.  The  abstract  and  concrete  causal  models  expedite  getting  to 
the  root  cause  of  a  problem.  This  is  similar  to  the  use  of  an  environmental  lattice 
in  assumption  truth  maintenance  systems  which  allows  perturbations  to  the 
system  descriptions. 

5.  Create  the  process  logic/model  for  each  causal  relation  in  the  causal  model.  This 
process  logic  is  closely  tied  to  implementation  code. 

6.  Eor  each  relevant  output  variable  of  a  causal  relation,  create  a 
semantic/ontological  description  or  potential  classification  of  the  possibly 
dynamic  output/variable. 

7.  Create  rules  based  on  the  causal  model  and  the  process  logic. 

8.  Create  conflict  resolution  rules  based  on  the  causal  model  and  ontology.  The 
conflict  is  resolved  by  rule-based  and  ontological  reasoning. 

9.  Introduce  minimal  model  perturbation  rules  based  on  ontology  and  knowledge 
bases  to  describe  how  the  value/link  adjustments  are  to  be  determined.  If  process 
logic  is  available,  it  is  also  used  to  help  determine  how  values/links  should  be 
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adjusted.  The  minimal  model  perturbation  is  elosely  related  to  the  previous 
eonflict  resolution  step. 

10.  For  all  the  steps  above,  relevant  ontologies  are  ereated  and  used  as  needed. 

Onee  these  steps  are  eompleted,  WIZER  is  ready  to  run  the  simulation  validation. 
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Table  16,  below,  lists  the  time  it  took  for  me  to  eonfigure  the  model  and  to  run 
WIZER  for  the  BioWar  and  CONSTRUCT  validation  seenarios  in  Chapters  6  and  7. 
Being  a  program,  the  speed  of  WIZER  depends  on  eomputer  speed,  memory,  and  storage 
eapaeity. 


Table  16,  Time  for  Knowledge  Configuration  oi 

1  Testbed  Scenarios 

Configuration  Step 

BioWar  (2  scenarios) 

CONSTRUCT  (3 
scenarios) 

Create  a  eoneeptual  model, 
if  it  does  not  already  exist 

1  hour 

1  hour 

Acquire  domain  knowledge 
and  data  (including 
reformating  the  knowledge 
and  data) 

14  days 

40  days 

Create  an  abstract  causal 
model  (or  influence  model) 
from  the  conceptual  model 

1  hour 

1  hour 

Create  a  concrete  causal 
model  from  the  causal 
model 

N/A 

N/A 

Create  a  process  model  for 
the  causal  models 

N/A 

N/A 

Create  semantic/ontological 
categorizations  for 
potentially  dynamic  causal 
variables 

4  hours 

3  hours 

Create  rules  based  on  causal 
models 

7  days 

4  days 

Create  conflict  resolution 
rules 

0.1  hour 

0.1  hour 

Create  minimal  perturbation 
rules  for  value/link 
adjustments 

0.1  hour 

0.1  hour 

Create  relevant  ontologies 
for  the  steps  above 

1  hour 

0.5  hour 

Run  the  simulations 

21  days 

7  days 

Run  WIZER 

1  day 

1  day 
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Table  17  provides  an  estimate  for  the  time  required  to  perform  the  knowledge 
eonfiguration  steps  and  to  run  WIZER  for  BioWar  and  CONSTRUCT  for  their  eomplete 
validation.  The  differenees  in  the  lengths  of  time  are  due  to  the  faet  that  the  testbeds  have 
different  eoneeptual  strueture,  size,  and  complexity.  The  time  is  assumed  to  be  for  one 


person  “team”  and  for  the  use  of  a  computer  server  with  quad-processors. 

Table  17.  Estimated  Time  for  Knowledge  Configuration  for  Complete  Validation 


Configuration  Step 

BioWar 

CONSTRUCT 

Create  a  conceptual  model, 
if  it  does  not  already  exist 

7  days 

2  days 

Acquire  domain  knowledge 
and  data 

14  days 

7  days 

Create  an  abstract  causal 
model  (or  influence  model) 
from  the  conceptual  model 

7  days 

3  days 

Create  a  concrete  causal 
model  from  the  causal 
model 

14  days 

7  days 

Create  a  process  model  for 
the  causal  models 

14  days 

7  days 

Create  semantic/ontological 
categorizations  for 
potentially  dynamic  causal 
variables 

7  days 

7  days 

Create  rules  based  on  causal 
models 

7  days 

4  days 

Create  conflict  resolution 
rules 

14  days 

7  days 

Create  minimal  perturbation 
rules  for  value/link 
adjustments 

14  days 

7  days 

Create  relevant  ontologies 
for  the  steps  above 

30  days 

14  days 

Run  the  simulations 

60  days 

10  days 

Run  WIZER 

3  days 

1  day 
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The  following  table  shows  the  level  of  expertise  each  step  needs. 


Table  18.  Expertise  Level  for  Each  Configuration  Step 


Configuration  Step 

Expertise  Level 

Create  a  conceptual 
model 

Knowledge  modeling  and  domain  knowledge 

Acquire  domain 

knowledge  and  data 

Data  entry 

Create  an  abstract  causal 
model  (or  influence 
model)  from  the 

conceptual  model 

Program  design  or  software  architect,  with  knowledge 
about  the  difference  between  causation  and  correlation 

Create  a  concrete  causal 
model  from  the  causal 
model 

Program  design  or  software  architect,  with  knowledge 
about  the  difference  between  causation  and  correlation 

Create  a  process  model 
for  the  causal  models 

Program  design  or  software  architect,  with  knowledge 
about  algorithms  and  processes 

Create 

semantic/ontological 
categorizations  for 

potentially  dynamic 

causal  variables 

Data  classification  and  domain  knowledge,  with 
knowledge  about  ontology 

Create  rules  based  on 
causal  models 

Program  design  or  software  architect,  with  knowledge 
about  rule -based  systems 

Create  conflict 

resolution  rules 

Program  design  or  software  architect 

Create  minimal 

perturbation  rules  for 
value/link  adjustments 

Program  design  or  software  architect 

Create  relevant 

ontologies  for  the  steps 
above 

Program  design  or  software  architect,  with  knowledge 
about  ontology 

Run  the  simulations 

Programmer 

Run  WIZER 

Programmer 

Thus,  at  the  minimum,  to  configure  the  knowledge  for  and  to  run  WIZER,  four  people  are 
needed:  one  domain  expert,  one  knowledge  engineer,  one  program  designer/software 
architect,  and  one  programmer  who  can  handle  data  entry.  An  advanced  programmer  can 
become  a  program  designer  and  software  architect  with  training  in  software  modeling 
techniques.  If  the  conceptual  model  already  exists,  which  should  be  the  case  for  most 
simulators,  the  number  of  persons  needed  reduces  to  two:  one  software  architect/program 
designer  and  one  programmer.  If  speed  is  essential,  another  person  can  be  added  whose 
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tasks  solely  deal  with  the  aequisition,  preparation,  and  formatting  of  empirieal  knowledge 
and  data. 

To  ereate  the  eonceptual  model,  one  proeess  scenario  would  be  for  the  domain 
expert  and  the  knowledge  engineer  or  software  engineer  to  talk  to  each  other.  The  talk 
should  proceed  informally  first.  After  an  informal  understanding  between  the  two  is 
reached,  the  knowledge  engineer  extracts  the  knowledge  from  the  domain  experts  step- 
by-step  formally. 

For  the  rest  of  the  knowledge  configuration  and  WIZER  run,  a  process  scenario 
would  be  for  a  program  designer  or  an  advanced  programmer  to  prepare  causal  model, 
rules,  semantic  categorization  of  data,  conflict  resolution  rules,  model  perturbation  rules, 
and  ontology.  This  person  also  leads  in  the  running  of  WIZER  and  the  interpretations  of 
the  results.  They  are  assisted  by  a  basic-level  programmer  or  data  entry  person  in  the 
running  of  WIZER  and  in  the  acquisition  of  empirical  data  and  knowledge. 
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12.3  An  Example  of  Knowledge  Configuration 


A  small  portion  of  the  code  of  the  BioWar  simulator  is  presented  below  in  pseudo-code 
to  serve  as  an  example  of  the  knowledge  configuration  steps.  The  pseudo-code  represents 
a  procedure  which  determines  whether  an  agent  gets  infected  with  a  disease  in  an 
outbreak. 


procedure  Outbreak 

let  outbreak  =  the  outbreak 

let  agent  =  the  agent  the  outbreak  may  cause  Infection 

if  agent  has  the  outbreak  (by  strain)  already 
do  not  reinfect  the  agent 
end  of  if 

dist  =  distance  between  this  agent  position  and  the  location  of  the  outbreak 
if  dist  >  ailment_ef f ective_radius 

disease_contact_probability  =  a  decaying  function  of  ailment_ef fective_radius 

else 

disease_contact_probability  =  1.0 
end  of  if 

person_risk  =  risk  of  getting  this  disease  based  on  age,  disease  type 
adjust  person_risk  by  risk  multiplier  and  risk  cap 

base_rate  =  initial  rate  of  getting  an  infection  from  a  susceptible  state 
for  this  outbreak 
adjust  base_rate  by  base_rate  cap 

inf ection_modif ier_prophylaxis  =  the  effect  of  an  intervention  or  prophylaxis 

total_risk  =  disease_contact_probability  *  base_rate  *  person_risk  * 
inf ection_modif ier_prophylaxis 

if  a  random  dice  throw  <  total_risk 

infect  this  agent  by  this  outbreak 
end  of  if 

end  of  procedure 


The  step-by-step  procedure  for  the  knowledge  configuration  related  to  the  above  routine 
is  as  follows. 

1.  The  conceptual  model  of  the  above  routine  is  a  simple  diagram  depicting  the 
relationship  between  an  outbreak  and  an  agent. 

2.  The  empirical  data  is  gathered  for  age  risk  factors. 

3.  The  abstract  causal  model  for  the  above  routine  is  as  follows,  written  in  N3. 
<infection  of  an  agent>  <is  caused  by>  <total_risk>  . 

<total  risk>  <consists  of>  <disease_contact_probability,  base_rate, 
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person  risk,  infection_modifier_prophylaxis>  . 

<base_rate>  <is  influenced  by>  <base_rate_cap>  . 

<person_risk>  <is  influenced  by>  <age,  disease  type,  risk  multiplier,  risk  cap>  . 
<disease_contact_probability>  <is  influenced  by> 

<dist,  ailment_effective_radius>  . 

4.  Part  of  the  concrete  causal  model  is  as  follows. 

<disease_contact_probability  =  1.0>  <is  caused  by> 

<dist  less  or  equal  than  ailment_effective_radius>  . 
<disease_contact_probability  =  a  decay  function>  <is  caused  by> 

<dist  greater  than  ailment_effective_radius>  . 

<agent  previous  infection>  <prevents>  <reinfection>  . 

5.  The  process  logic  is  the  pseudocode  augmented  by  semantics  (knowledge  base) 
and  ontology. 

6.  For  each  variable  of  disease_contact_probability,  base_rate,  person_risk,  and 
infection_modifier_prophylaxis,  semantic  categories  are  created  for  their  dynamic 
values.  This  requires  domain  knowledge.  As  an  example,  the  determination  of 
semantic  categories  depends  on  the  medical  knowledge  about  the  infectiousness 
of  a  disease. 

7.  Rules  related  to  the  causal  relations  are  created. 

8.  Conflict  resolution  rules  for  the  variables  are  created. 

9.  Minimal  model  perturbation  rules  for  the  variables  are  created. 

10.  The  relevant  ontology  is  created. 


12.4  Summary 


This  chapter  describes  the  code  structure  of  WIZER  and  outlines  the  steps  for  knowledge 
bases  creation  and  ontology  preparation  for  WIZER.  It  provides  a  guide  for  the  use  of 
WIZER. 
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Chapter  XIII:  Discussion 


This  chapter  summarizes  the  eontributions,  limitations,  and  potential  extensions  of  this 
dissertation  researeh.  The  eontributions  are  both  eonoeptual/theoretieal  and  practieal.  The 
limitations  show  themselves  in  the  current  WIZER  implementation  and  also  in  the  laek  of 
learning  capability,  for  example.  Potential  extensions  ineluding  adding  a  learning 
eapability  are  deseribed. 


13.1  Contributions 


The  eontributions  of  this  thesis  are  threefold.  First,  I  developed  a  novel  eoneeptual 
approaeh  for  the  automated  validation  of  multi-agent  simulation  tools.  Seeond,  I 
implemented  and  tested  an  automated  tool  for  validation  (WIZER),  based  on  this 
eoneeptualization.  Third,  I  examined  the  added-value  of  using  this  tool  for  validation 
using  two  distinet  data  sets  and  multi-agent  simulations.  The  results  indieate  that  WIZER 
speeds  up  the  rate  of  validation,  reduces  the  amount  of  necessary  seareh,  and  foeuses  the 
seareh  based  on  knowledge.  The  eoneeptual  eontributions  include  shedding  light  on  the 
knowledge,  logic  types,  and  struetures  of  simulation  (the  relationships  between 
simulation  eode,  proeess  logie,  eausal  logie,  eoneeptual  model,  ontology,  and  empirical 
data  and  knowledge),  a  novel  knowledge-based  and  ontologieal  approaeh  to  validation 
automation,  and  the  integration  of  simulation  and  knowledge  management.  Previously, 
knowledge  inferenee  and  simulation  were  eonsidered  to  be  separate,  as  are  knowledge 
management  and  simulation.  This  thesis  indieates  that  these  fields  are  closely  intertwined 
and  that  they  should  inform  each  other  elosely. 

As  noted  before,  this  thesis  deseribed  the  implementation  of  the  eoneeptualization 
of  knowledge-based  and  ontologieal  validation  in  a  tool  ealled  WIZER  that  is  eonsistent 
with  the  eoneeptualization.  The  tool  WIZER  was  implemented  in  four  parts:  the  Alert 
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WIZER,  the  Inference  Engine,  the  Simulation  Knowledge  Space,  and  the  Domain 
Knowledge  Space.  This  thesis  demonstrated  that; 

1 .  Semantic  categorizations  of  data  and  semantic  control  of  statistical  routines  were 
made  feasible  by  the  Alert  WIZER. 

2.  Knowledge-based  and  ontological  reasoning  and  parameter  value  adjustments 
were  made  feasible  by  the  Inference  Engine  in  WIZER. 

3.  Explicit  encoding  and  computer  processing  of  simulation  and  domain  knowledge 
were  made  feasible  by  the  Simulation  Knowledge  Space  and  the  Domain 
Knowledge  Space  as  used  in  WIZER. 

4.  WIZER  is  a  general  tool,  as  evidenced  by  validation  done  on  two  simulation 
models.  Bio  War  and  CONSTRUCT. 

5.  WIZER  speeds  up  the  rate  of  validation,  reduces  the  amount  of  necessary  search 
in  parameter  space  for  validation,  and  increases  the  focus  of  the  search  to  the  most 
relevant  area  for  validation. 

By  augmenting  simulations  with  WIZER,  simulation  validation  can  be  automated 
and  simulation  knowledge  can  be  made  clear  and  operable.  Tools  such  as  WIZER  are 
important  as  they  help  clarify  and  speed  up  the  validation  process,  in  addition  to  helping 
automate  the  process. 


13.2  Limitations 


The  dissertation  does  not  implement  all  the  conceptual  potentials  of  WIZER.  These 
include  experiment  design  (which  can  be  implemented  via  appropriate  ontology  and 
causal  rules),  simulation  control  (again,  this  can  be  implemented  through  ontology  and 
causal  rules),  and  a  more  sophisticated  version  of  hypothesis  building. 

In  summary,  the  limitations  include; 

1.  No  learning  capability,  except  inference  and  a  simple  search  for  hypothesis 
building. 
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2.  No  causal  and/or  model  learning  from  data  implemented, 

3.  No  eausal  relation  derivations  from  simulation  model  or  eode, 

4.  Experiment  design  is  not  implemented, 

5.  Simulation  eontrol  is  not  implemented, 

6.  Data  limitations  prevent  more  extensive  and  eomprehensive  validation  and 
model-improvement  trials, 

7.  Simple  reasoning  meehanisms  via  forward  ehaining  and  ontologieal  reasoning, 

8.  The  validation  of  knowledge  bases  is  not  eovered.  Knowledge  bases  should  be 
validated  against  textbook  knowledge,  expert  knowledge,  and  empirieal  data. 
How  exaetly  to  do  this  is  a  researeh  topie  in  its  own  right. 


13.3  Discussion 


This  dissertation  provides  a  knowledge-based  and  ontologieal  approaeh  to  validation  with 
a  side  effeet  of  model-improvement.  Readers  should  be  able  to  use  the  approaeh  and  the 
tool  to  do  extensive  validation  of  simulations  and  a  simple  model-improvement  of  them. 

The  work  in  this  dissertation  ean  be  extended  in  many  different  ways.  The 
immediate  extensions  are  probing  the  strueture  and  parameter  trade-offs  (in  the  eonfliet 
resolution  and  value/link  adjustment  routines),  probing  how  to  automatieally  get/derive 
eausal  models  from  the  eoneeptual  model,  how  to  automatieally  get/derive  proeess 
models  from  the  eausal  model,  how  to  automatieally  eonstruet  eoneeptual  model  from 
empirieal  data  and  knowledge  (a  data  mining  and  maehine  learning  problem),  and  how  to 
automatieally  infer  proeess  models  from  eode,  eausal  models  from  the  proeess  model, 
and  eoneeptual  models  from  the  eausal  model.  The  derivations  and  inferenees  may  be 
aided  by  ontology  and  higher-level  knowledge. 

For  agent-based  simulations,  a  graphieal  user  interfaee  is  needed  for  displaying 
and/or  editing  eode,  pseudoeode,  proeess  models,  eausal  models,  eoneeptual  model,  and 
empirieal  data  and  knowledge.  This  graphieal  user  interfaee  should  provide  eommands  to 


205 


run  simulations,  view  the  results,  and  see/trace  the  effect  of  knowledge  bases  and 
ontology. 

Other  extensions  include: 

1 .  Adding  a  experiment  design  module, 

2.  Adding  a  simulation  control  module, 

3.  Adding  a  mathematical  ontology  module, 

4.  Making  the  inference  engine  more  sophisticated, 

5.  Enhancing  conflict  resolutions, 

6.  Enhancing  value/link  adjustment, 

7.  Adding  a  better  model-improvement, 

8.  Adding  a  module  for  the  validation  of  knowledge  bases, 

9.  Validating  the  BioWar  and  CONSTREICT  testbeds  comprehensively, 

10.  Examining  more  testbeds,  including  the  Archimedes  simulator. 

Knowledge-based  systems  have  been  successfully  embedded  in  many  successful 

software  applications,  including  tax  preparation,  business  rules,  and  grammar/spelling 
checker  applications.  Integrating  knowledge-based  and  ontology  with  simulation  should 
ease  the  validation  and  model-improvement  process,  in  addition  to  facilitating  knowledge 
management  of  simulations. 

Data  mining  tools  can  extract  rules  or  relationships  from  empirical  data.  They  can 
also  extract  rules  or  relationships  from  simulation  data,  by  running  the  simulations  and 
analyzing  the  resulting  wealth  of  data.  Their  capabilities,  however,  are  limited  to 
classification  and  correlation.  Neither  causal  relation  nor  conceptual  model  can  be 
garnered  by  data  mining  tools.  Thus  data  mining  tools  are  the  most  useful  in  providing 
inputs  for  Alert  WIZER  in  the  form  of  symbolic  categories  and  statistical  measures  of 
data.  They  can  be  useful  for  characterizing  the  simulation  knowledge  space  (as  rules 
extracted  from  simulated  data)  or  the  domain/empirical  knowledge  space  (as  rules 
extracted  from  empirical  data)  in  the  form  of  correlation  between  variables.  No  work  yet 
is  done  in  data  mining  or  machine  learning  on  extracting  process  model/logic  from 
simulation  code,  extracting  causal  model  from  the  process  model/logic,  and  inferring 
conceptual  model  from  the  causal  model.  Inference  in  the  other  direction  is  not  yet 
automated  either:  deriving  conceptual  models  from  human  knowledge,  causal  models 
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from  conceptual  models,  proeess  models/logie  from  causal  logic  and  domain  knowledge, 
and  eode  or  rules  from  proeess  models/logic.  Real  world  eausality  in  partieular  is  a  major 
research  problem:  it  is  searehing  for  cause-and-effect  regularities  in  a  not-so-orderly  and 
uneertain  world  and  it  depends  on,  is  inferred  from,  and  should  infer  the  (almost  eertain) 
predietable  regularities  to  work.  The  above  inferenee  problems  form  an  exeiting  future 
researeh  for  data  mining  and  maehine  learning.  Furthermore,  this  dissertation  indieates 
that  it  is  feasible  to  advanee  researeh  on  the  fundamental  and  theoretieal  foundations  for 
WIZER  based  on  natural  seienee  (e.g.,  the  physieal  nature  of  eausality),  eomputer  seienee 
(including  machine  learning),  and  mathematies. 
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13.4  Summary 


This  dissertation  demonstrates  the  utility  of  knowledge-based  and  ontologieal  approaeh 
for  validation  and  model-improvement  of  simulations,  partieularly  soeial  simulations. 
The  validation  seenario  results  of  two  testbeds  show  that  the  tool  WIZER  is  useful  for 
validation  and  model-improvement. 

The  eontributions  of  this  dissertation  inelude  a  new  eoneeptualization  based  on 
knowledge  and  ontology  for  validation  and  model-improvement  of  simulations,  a  tool 
implementing  this  eoneeptualization,  partial  validation  results  of  the  Bio  War  and 
CONSTRUCT  simulators,  a  new  simulation  deseription  logie,  and  knowledge-based 
hypothesis  building  and  testing.  This  dissertation  indieates  that  validated  simulations  are 
essential  for  the  examination  of  eausal  relations  to  achieve  better  causal  relation  validity. 

While  many  things  about  WIZER  can  be  improved,  this  dissertation  indicates 
there  is  a  good  conceptual  foundation  for  future  improvements  of  WIZER,  including  for 
the  experiment  design  and  simulation  control  enhancement  of  WIZER. 
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Appendix  A.  Modeling  and  Simulation 


Modeling  and  simulation  (Law  and  Kelton  2000)  is  an  approach  for  developing  a  level  of 
understanding  of  the  interaction  of  the  parts  of  a  system,  and  of  the  system  as  a  whole.  It 
is  one  of  the  most  widely  used  operations-research  and  management  science  techniques 
(two  others  are  mathematical  programming  and  statistics).  The  level  of  understanding 
which  may  be  developed  using  modeling  and  simulation  is  seldom  achievable  otherwise, 
except  possibly  for  system  dynamics.  A  system  is  an  entity  which  maintains  its  existence 
through  the  interaction  of  its  parts.  A  model  is  a  simplified  representation  of  the  actual 
system  intended  to  elicit  understanding. 

This  appendix  talks  about  simulation  types  and  where  WIZER  stands  among 
them.  It  also  indicates  a  way  for  learning  simulation  models  from  data,  utilizing  WIZER 
capabilites.  The  simulation  types  include  discrete  event  simulation,  continuous 
simulation,  and  agent-based  simulation. 


A.l  Simulation  Model  Classification 


Simulation  models  can  be  classified  along  several  dimensions: 

1.  Time:  a  simulation  model  can  be  static  or  dynamic.  A  static  simulation  model  is 
one  in  which  time  plays  no  role  or  one  that  represents  a  snapshot  of  a  system  at  a 
particular  time.  A  dynamic  simulation  model  represents  a  system  as  it  evolves 
over  time. 

2.  Randomness:  a  simulation  model  can  be  deterministic  or  stochastic.  A  simulation 
is  deterministic  if  the  model  underlying  this  simulation  does  not  contain  any 
random  or  probabilistic  components.  Otherwise,  a  simulation  model  is  stochastic. 

3.  Continuity:  a  simulation  model  can  be  continuous  or  discrete.  A  continuous 
simulation  model  represents  a  continuously  evolving  system  often  describable  by 
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differential  equations.  A  diserete  simulation  model  represents  changes  of  a  system 
as  separate  events. 

Most  simulations  are  stochastic  and  dynamic.  So  a  more  cogent  way  of 
classifying  simulation  is  dividing  them  into:  discrete  event  simulation  model,  continuous 
simulation  model,  and  agent-based  simulation  model.  The  latter  is  widely  used  as  a 
versatile  technique  to  model  social  and  heterogeneous  population  systems.  Of  course, 
these  models  can  be  deterministic  and/or  static,  but  most  are  not,  thus  the  new 
classification. 


A.2  Discrete  Event  Simulation 


Discrete  event  simulation  concerns  the  modeling  of  a  system  as  it  evolves  over  time  by 
representing  the  changes  as  separate  events.  This  is  the  opposite  of  continuous  simulation 
where  the  system  evolves  as  a  continuous  function. 

Among  the  discrete  event  simulation  formalisms  is  the  finite  state  machine.  A 
finite  state  machine  is  a  programming  construct  which  proceeds  in  separate  and  discrete 
steps  from  one  to  another  of  a  finite  number  of  configurations  or  states. 


A.3  Continuous  Simulation 

In  continuous  simulation,  the  system  evolves  in  a  continuous  fashion,  which  can  be 
described  by  differential  equations.  System  dynamics  is  an  approach  to  simulate 
continuous  simulation.  Continuous  simulations  are  something  that  can  only  really  be 
accomplished  with  an  analog  computer.  Using  a  digital  computer  one  can  approximate  or 
emulate  a  continuous  simulation  by  making  the  time  step  of  the  simulation  sufficiently 
small. 
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A.4  Agent-based  Simulation 


Agent-based  simulation  differs  from  traditional  kinds  of  simulation  in  that  some  or  all  of 
the  simulated  entities  are  modeled  in  the  form  of  agents.  An  agent  is  an  abstraetion  of  an 
individual.  As  it  explieitly  attempts  to  model  speeifie  behaviors  of  speeifie  individuals,  it 
is  in  eontrast  to  methods  where  the  eharaeteristics  of  a  population  are  averaged  over, 
whieh  is  to  say  the  model  attempts  to  simulate  ehanges  in  these  averaged  eharaeteristies 
for  the  whole  population. 


A.5  Simulation  Acceptance 


While  simulations  have  been  used  sueeessfully  for  many  tasks,  wider  aeeeptance  of 
simulation  is  still  impeded  by  the  impression  that  simulation  is  just  a  toy,  not  a  serious 
tool  for  deeision  analysis  and  making.  Speoifieally,  simulation  aeeeptance  is  hindered  by 
several  factors  including: 

1.  Models  for  large-scale  systems  tend  to  be  very  complex  and  coding  them  is  an 
arduous  task. 

2.  A  large  amount  of  computing  time  is  required. 

3.  Simulation  is  thought  of  as  just  an  exercise  in  computer  programming.  This 
neglects  the  important  issue  of  how  a  properly  coded  model  should  be  used  to 
make  inferences  about  the  system  of  interest. 

4.  Understanding  of  a  simulation  depends  on  judgment  calls  of  how  much  details  of 
a  problem  should  be  modeled. 

5.  The  development  of  simulation  systems  is  often  an  iterative  process  by  necessity. 
This  sometimes  invites  a  query  of  “if  you  say  the  simulation  or  program  is  valid, 
why  is  it  that  you  have  another  version?” 
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This  dissertation  ameliorates  the  above  factors  by  augmenting  simulation  (and 
modeling)  with  a  symbolic,  knowledge-based,  and  ontological  representation  and 
reasoning  system.  Specifically,  this  symbolic  reasoning  system  has  the  following 
capabilities: 

1.  It  grounds  the  complex  models  of  large-scale  systems  on  ontologies  and 
knowledge  bases  enabling  them  to  be  reasoned  about. 

2.  It  reduces  the  amount  of  computing  by  knowledge  and  ontological  inference. 

3.  It  allows  symbolic  validation  of  the  simulation.  Symbolic  validation, 
ontologies,  and  knowledge  bases  facilitate  the  use  of  the  simulation  model  to 
make  inferences  about  the  system  of  interest. 

4.  How  much  detail  a  simulation  should  model  is  guided  by  symbolic, 
knowledge-based,  and  ontological  reasoning  of  the  policy  question  at  hand. 

5.  It  allows  explanations  of  the  simulation  occurrences,  particularly  what 
variables  or  combinations  of  variables  cause  simulation  outcomes  and  why. 

6.  Explicit  and  symbolic  representation  of  the  simulation  model  (which  allows 
reasoning)  enables  an  improved  iterative  process  of  simulation  development 
cycles.  This  is  aided  by  WIZER  for  the  validation  in  each  development  cycle. 
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A.6  Simulation  and  WIZER 


The  BioWar  simulation  has  the  components  of  agent-based  simulation  and  discrete  event 
simulation.  The  outputs  of  the  BioWar  simulation  are  often  in  the  form  of  (conceptually) 
continuous  curves.  WIZER  can  handle  the  validation  of  BioWar,  which  includes  the 
handling  and  understanding  of  agents,  discrete  events,  and  continuous  curves.  This  is 
because  WIZER  is  a  tool  implementing  the  knowledge-based  and  ontological  approach. 
WIZER  performs  inference  on  simulations.  In  database  systems  analogy,  WIZER  acts 
like  SQE  (structured-query-language)  to  the  simulation.  The  following  figure  illustrates 
this  point. 


Figure  A,l.  Analogy  of  WIZER’ s  Role  to  Simulations  as  SQL’s  Role  to  Databases 

Simulations  as  computer  programs  are  unconstrained,  which  means  that  anything 
can  be  simulated  including  processes  and  things  that  are  unreal  and  erroneous.  The 
constraints  of  physical  reality  are  not  built  into  simulations  by  default.  They  must  be 
carefully  designed  and  implemented.  Erom  the  validation  point  of  view,  the  physical 
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constraints  on  simulation  reduce  the  size  of  the  search  space  for  validation.  Thus  the 
physieal  eonstraints  encoded  by  ontologies  and  knowledge  bases  are  critieal  for 
validation.  Another  constraint  that  is  important  and  useful  for  validation  is  the  poliey 
question  eonstraint.  Similarly  eneoded  in  ontologies  and  knowledge  bases,  the  poliey 
question  restricts  the  seope  and  quantity  of  seareh  for  validation.  The  poliey  design  and 
analysis  field  requires  the  applieation  of  eareful  and  elear  thinking  in  defining  problems 
and  finding  solutions  without  needing  to  do  extensive  seareh.  Only  humans  ean  perform 
poliey  design  and  analysis  competently  at  the  time  of  this  writing.  WIZER  faeilitates  the 
use  of  both  types  of  eonstraints  for  validation  and  model-improvement. 

Statistics  have  the  assumptions  of  sample  independenee,  normality,  and 
randomness  for  the  parametrie  methods  of  statistics  to  work.  Absent  sample 
independenee  and  normality,  non-parametric  methods  of  statisties  ean  work  given  that  the 
samples  are  random.  Simulations,  on  the  other  hand,  do  not  need  to  have  the  above 
assumptions.  Instead  of  samples,  entities  and  relationships  ean  be  modeled  in  great  detail 
-  as  detailed  as  needed  to  answer  a  poliey/researeh  question.  WIZER  adds  to  this 
eapability  of  simulations  by  providing  the  symbolie  or  semantie  eompanion  for 
simulation.  This  means  all  events  that  happen  in  simulation  ean  be  explained,  used  for 
reasoning,  and  eontrolled  symbolieally  or  semantieally.  Simulations  ean  model  networks, 
game-theoretie  systems,  and  any  other  system.  WIZER  ean  deseribe  the  simulations  of 
these  systems  semantieally  and  perform  suitable  inferenees.  In  the  spectrum  of  increasing 
realism  (and  also  the  need  for  better  data),  statistics  (parametric,  non-parametric,  and 
Bayesian  statisties)  is  at  the  start  of  the  speetrum  while  the  physieally-realistie  simulation 
is  at  the  end  of  the  speetrum.  Near  the  end  of  the  speetrum  is  the  agent-based  simulation. 
The  utility  of  WIZER  inereases  as  the  simulation  traverses  toward  the  end  of  the 
speetrum.  The  principle  of  KISS  (Keep  It  Simple  Stupid)  usually  restriets  the  simulation 
to  be  simple,  partly  due  to  the  difficulty  of  validation.  WIZER  helps  the  validation  of 
more  eomplex  simulations,  and  thus  helps  push  the  realization  of  more  eomplex  and 
realistie  simulations  toward  the  end  of  the  spectrum,  especially  of  the  simulations 
designed  for  use  in  policy  making. 
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A.7  Simulation  Model  Learning  from  Data 


Learning  simulation  models,  or  any  models,  from  data  is  non-trivial.  Reeent  researeh, 
however,  points  to  a  possible  way  to  learn  simulation  models.  It  is  done  through  eausal 
learning  from  data.  Onee  the  eausal  relationships  are  learned  (the  relationships  may  have 
a  probability  of  eausation  assigned  to  them),  they  are  put  together  in  a  simulation  model. 
The  simulation  is  run  and  the  strengths  of  eausal  relationships  are  adjusted  based  on  the 
mateh  against  empirieal  data.  In  other  words,  WIZER  allows  validated  simulations  to 
verify  the  strengths  of  learned  causal  relationships.  Depending  on  the  data,  detailed 
causal  relationships  can  be  extracted.  More  detailed  causal  relationships  may  approach 
the  underlying  processes  and  mechanisms. 

To  get  to  the  underlying  processes  and  mechanisms,  a  simulation  model  via 
WIZER  can  perform  an  exploratory  search  for  possible  processes/mechanisms  (based  on 
physical  knowledge  encoded  in  ontology,  for  example)  for  a  given  causal  relationship. 
This  is  similar  to  the  generate-and-test  procedure.  A  more  sophisticated  hypothesis 
formation  can  also  be  employed.  After  a  new  process/mechanism  is  hypothesized  for  a 
given  causal  relationship,  the  simulation  is  run,  the  results  of  this  simulation  are  validated 
using  WIZER,  and  whether  the  hypothesized  process/mechanism  is  valid  can  be  assessed. 

The  above  indicates  that  it  is  conceptually  possible  to  learn  a  simulation  model 
from  data.  This  allows  for  model  learning  from  data,  an  improvement  over  just  learning 
Bayesian  networks  or  causal  networks.  Using  validated  simulations  and  WIZER  to 
uncover  true  causal  relations  is  a  novel  concept.  Previously,  causality  is  assumed  to  be 
graph-based  (Pearl  2000)  with  many  “exogenous”  assumptions  abstracted  away. 
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A.8  Summary 


This  appendix  positions  WIZER  with  respect  to  simulation  types.  It  shows  that  WIZER 
acts  as  a  symbolic  manager  and  explainer  for  simulations.  It  is  capable  of  inference  and 
hypothesis  building  and  testing.  How  WIZER  can  conceptually  facilitate  the  learning  of 
simulation  models  from  data  is  also  explained. 
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Appendix  B.  Augmenting  System 
Dynamics 


System  dynamics  is  one  of  the  most  successful  methods  for  modeling  and  understanding 
systems  as  a  whole,  not  just  as  parts.  Its  core  concept  is  the  understanding  of  how  all  the 
objects  and  people  in  a  system  interact  with  one  another.  The  objects  and  people  can 
interact  through  feedback  loops,  where  a  change  in  one  variable  affects  other  variables 
over  time,  which  in  turn  affects  the  original  variable,  and  so  on. 

This  appendix  describes  the  state  of  the  art  of  system  dynamics.  It  then  explains 
how  WIZER  could  be  employed  to  augment  system  dynamics. 


B.l  Description  of  System  Dynamics 


System  dynamics  is  concerned  with  the  behavior  of  a  system  over  time  or  the  dynamic 
behavior  of  the  system.  Identifying  its  key  patterns  of  behavior,  known  as  “time  paths”  or 
“curves”,  is  crucial.  The  time  paths  include  the  linear,  exponential,  goal-seeking, 
oscillation,  and  S-shape  families  of  time  paths  or  curves. 

In  system  dynamics  modeling,  all  dynamic  behaviors  in  the  world  are  assumed  to 
occur  in  flows  which  accumulate  in  stocks.  The  following  figure  shows  the  stock  and 
flow  diagram  for  credit  card  inflow  (Ratha  2001). 
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Balance  Payable 


CREDIT  LIMIT 

Figure  B,l,  Stock  and  Flow  Diagram  for  Credit  Card  Inflow 


As  shown,  the  box  indicates  a  stock  “Balance  Payable”,  where  elements/amounts 
accumulate  and  then  increase  and  decrease  over  time.  The  circle  with  a  small  T  on  the  top 
represents  a  flow  control,  similar  to  a  valve.  The  circles  with  extending  lines  and  arrows 
indicate  influences,  which  can  be  a  positive  or  negative  feedback.  Finally  the  clouds 
indicate  indeterminate  supply.  System  dynamics  uses  mathematical  equations  including 
differential/difference  equations  to  relate  stock  with  flow. 

In  the  example  for  Credit  Card  Inflow,  the  equations  are  as  follows  (where  t 
denotes  time  and  dt  denotes  the  time  difference). 


Balance_Payable  (t)  =  Balance_Payable  (t  -  dt)  +  (credit_card_purchases  +  interest_charges 

-  payments)  *  dt 
INITIAL  VALUE  =  0 
SPENDING_FRACTION  =  0.1 

credit_card_purchases  =  SPENDING_FRACTION  *  AVAILABLE_CREDIT 
CREDIT_LIMIT  =  10000 

available_credit  =  CREDIT_LIMIT  -  Balance_Payable 

The  constants  for  SPENDING  FRACTION  and  CREDIT  LIMIT  are  an  instantiation  for 
a  particular  credit  card  holder.  The  Balance  Payable  is  computed  using  a  difference 
equation  between  time  t  and  time  (t-  dt). 


218 


B.2  WIZER  and  System  Dynamics 


As  Alert  WIZER  can  characterize  curves  or  time  paths  (e.g.,  the  S-shape  of  a  curve) 
semantically,  it  can  be  used  to  characterize  system  dynamics’  time  paths  into  a 
knowledge  format  ready  for  knowledge  inference. 

Moreover,  the  level  of  stock  and  the  speed  of  flow  can  be  characterized 
semantically  by  WIZER  using  its  knowledge  bases  and  ontology.  A  specific  type  of  stock 
and/or  flow  can  have  different  effects  and  this  can  be  characterized  by  WIZER  using 
ontology.  As  WIZER  does  not  require  continuous  stock  and  flow,  jagged  or  abrupt 
transitions  can  be  handled  properly.  What  this  abrupt  transition  means  can  be  captured  by 
WIZER.  Using  mathematical  ontology,  WIZER  can  describe  the  differential  equations  in 
system  dynamics. 

WIZER  can  validate  the  results  of  system  dynamics  simulation  against  empirical 
data.  The  system  dynamics  model  of  stocks,  flows,  and  feedbacks  is  encoded  in 
knowledge  bases  and  causal/process  ontology  in  WIZER.  Based  on  the  results  of 
empirical  data  comparison  against  system  dynamics  simulation  outputs,  WIZER  can 
suggest  which  stocks,  flows,  and/or  feedbacks  need  change.  WIZER  outputs  symbolic  or 
semantics  information  about  the  system  dynamics  simulation  runs. 

As  an  example,  Eigure  B.2  shows  a  system  dynamics  diagram  on  Eactory  (Albin 

1997): 


219 


PRICE  OF  PRODUCT  COST  OF  PRODUCT 


Figure  B,2,  Stock  and  Flow  Diagram  of  a  Company 

As  shown,  the  Inventory  depends  on  produetion  and  shipments,  while  the 
Attraetiveness  of  the  Firm  depends  on  the  inerease  in  attraetiveness.  The  Attraetiveness 
of  the  Firm  then  influenees  positively  the  amount  of  orders.  In  WIZER  knowledge  base, 
the  above  diagram  ean  be  represented  as  follows.  “Negative”  means  negatively 
influeneing  or  deereasing,  while  “positive”  means  positively  influeneing  or  increasing, 
(causes  (positive  production)  (positive  Inventory)) 

(causes  (and  (positive  Inventory)  (positive  orders))  (positive  shipments)) 

(causes  (positive  increase  in  attractiveness)  (positive  Attractiveness_of_Firm)) 
(causes  (positive  delivery_delay)  (negative  increase_in_attractiveness)) 

(causes  ((positive  orders)  and  (negative  Inventory))  (positive  delivery_delay)) 
(causes  (and  (positive  POTENTIALORDERS)  (positive 
Attractiveness  of  Eirm))  (positive  orders)) 
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(causes  (and  (positive  PRICE  OF  PRODUCT)  (positive  shipments)  (negative 
COST  OF  PRODUCT)  (positive  profit)) 

The  if-then  rules  are  as  follows. 

(if-then  (positive  Inventory)  (positive  produetion)) 

(if-then  (positive  delivery  delay)  (and  (negative  Inventory)  (positive  orders)) 
(if-then  (negative  delivery  delay)  (and  (positive  Inventory)  (negative  orders)) 
(if-then  (negative  delivery  delay)  (positive  (minus  (Inventory  orders)))) 

(if-then  (positive  inerease_in_attraetiveness)  (negative  delivery_delay)) 

(if-then  (positive  orders)  (and  (positive  INITIAFORDFRS)  (positive 
AttraetivenessofFirm))) 

(if-then  (positive  shipments)  (and  (positive  Inventory)  (positive  orders))) 

(if-then  (positive  profit)  (and  (positive  PRICF  OF  PRODUCT)  (positive 
shipments)  (negative  COST  OF  PRODUCT))) 

(if-then  (positive  profit)  (positive  (minus  (add  PRICF  OF  PRODUCT  positive 
shipments)  COST  OF  PRODUCT))) 

The  proeesses/meehanisms  underlying  the  above  eausal  diagram  are  implemented  in 
mathematical  equations,  similar  to  what  system  dynamies  does  but  with  added  semanties 
via  ontology  and  knowledge  bases.  The  curves  resulted  from  simulating  the  system  are 
eharaeterized  by  Alert  WIZFR  symbolieally.  These  symbolie  charaeterizations  allow  the 
labeling  of  transition  points,  whieh  is  important  for  understanding  the  system. 

Furthermore,  when  the  underlying  instantiated  individual  eompany  data  is 
available,  WIZFR  ean  ground  the  abstraet  working  of  the  system  dynamies  diagram  on 
the  empirieal  data.  It  ean  faeilitate  a  more  detailed  symbolie  and  quantitative  simulation 
of  the  Company  (than  the  system  dynamies  simulation).  As  not  all  orders  are  the  same, 
for  a  eomputer  direct  order  eompany,  for  example,  the  order  of  a  motherboard  is  more 
important  finaneially  than  the  order  of  a  mouse.  Some  orders  are  linked  to  eaeh  other:  the 
order  of  a  graphies  eard  is  linked  with  the  order  of  a  fiat  sereen.  All  these  symbolie  faets 
ean  be  proeessed  by  WIZFR,  but  not  by  system  dynamies. 
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B.3  Summary 


This  appendix  describes  the  fundamentals  of  system  dynamics.  It  also  explains  how 
WIZER  can  augment  system  dynamics  so  that  it  can  have  improved  knowledge 
management,  improved  simulation  management,  and  better  validation.  System  dynamics 
handles  abstract  systems  while  WIZER  can  handle  concrete,  individually  instantiated 
systems  including  their  semantics  or  knowledge  aspect. 
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Appendix  C.  BioWar  Ontology  and 
Knowledge  Base 


The  BioWar  simulator  is  complex  and  evolving.  The  ontology  and  knowledge  bases 
below  represent  the  BioWar  version  2.0  knowledge  as  of  June  2006.  For  simplicity,  all  is 
written  in  the  modified  N3  notation. 

Person: 

<person>  <has  relationships  of>  <parent,  spouse,  child,  sibling,  neighbor>  . 

<person>  <is  located  in>  <a  GPS  coordinate>  . 

<person>  <lives  in>  <a  school  district>  . 

<person>  <has>  <demographics>  . 

<demographics>  <has  type  ol^  <age,  gender,  race>  . 

<age>  <consists  of>  <0-4,  5-16,  17-25,  26-44,  45-55,  56  and  more>  . 

<gender>  <consists  ol^  <male,  female>  . 

<race>  <consists  of>  <white,  black,  hispanics,  asian>  . 

<person>  <has>  <knowledge  vector>  . 

<knowledge  vector>  <causes>  <interaction>  . 

<two  persons>  <has>  <proximity>  . 

<proximity>  <causes>  <interaction>  . 

<person>  <has>  <disease  vector>  . 

<interaction  AND  disease  vector>  <causes>  <disease  transmission>  . 

<person>  <is  part  of>  <ego  network>  . 

<ego  network>  <causes>  <interaction>  . 

<person>  <has  health  status  of> 

<healthy,  susceptible,  infected,  recovered,  immune,  deceased>  . 

<symptom  severity>  <causes>  <agent  sick  behaviors>  . 

<agent  daily  behaviors>  <has  type  of>  <sleep,  at  school,  at  work,  in  transit, 
at  restaurant,  at  the  mall,  at  cinema,  outdoor  recreation>  . 

<agent  sick  behaviors>  <has  type  of> 
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<normal,  rest  at  home,  going  to  pharmacy,  going  to  doctor  office, 
going  to  emergency  room>  . 

Disease: 

<disease>  <consists  of>  <attack  disease,  background  disease> 

<disease>  <causes>  <symptom>  . 

<disease>  <has  type  ot^ 

<communicable,  noncommunicable,  contagious,  noncontagious>  . 

<symptom>  <has  phase  of> 

<infected,  communicable,  early  symptomatic,  late  symptomatic>  . 

<symptom>  <causes>  <symptom  severity>  . 

<symptom  severity>  <causes> 

<going  to  pharmacy,  going  to  doctor,  going  to  emergency  room>  . 

<going  to  pharmacy>  <causes>  <drug  purchase>  . 

<drug  purchase>  <has  type  of>  <analgesics,  stomach,  cough/cold>  . 

<symptom>  <has  type  ol^  <fever,  sneeze,  cough,  cold,  headache,  diarrhea>  . 
<symptom  type>  <causes>  <specific  drug  purchase>  . 

<going  to  doctor,  going  to  emergency  room>  <causes>  <diagnosis>  . 

<diagnosis>  <causes>  <treatment,  dismissal  . 

<treatment>  <causes>  <recovery,  hospital  stay,  death>  . 

<disease>  <has  instance  of>  <influenza,  anthrax,  smallpox,  gastroenteritis,  ...>  . 
<anthrax  spread>  <is  influenced  by>  <wind,  sunlight,  terrain>  . 

<wind>  <has  property  of>  <speed,  direction,  atmospheric  stability  class>  . 

<initial  infection  number>  <is  influenced  by> 

<disease  type,  ailment  effective  radius,  base-rate,  weather,  number  of  people>  . 
<disease  exchange>  <is  influenced  by> 

<proximity,  ailment  exchange  proximity  radius,  ego  network,  randomness>  . 
<ego  network>  <is  caused  by> 

<initial  ego  network,  homophily,  information  seeking,  knowledge  vector>  . 
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Chemical  ailment: 

<chemical  attack>  <has  type  of>  <noncontagious,  noncommunicable>  . 

<chemical  infliction>  <has  phase  of^  <inflicted,  symptomatie,  post-symptomatic>  . 
<symptomatie  phase>  <causes>  <treatment>  . 

<treatment>  <eauses>  <reeovery,  death>  . 

Response: 

<response>  <has  type  of>  <medieal  response,  physieal  response>  . 

<medieal  response>  <has  type  of>  <prophylaxis,  autoimmune>  . 

<physieal  response>  <has  type  of>  <isolate,  quarantine,  evaeuate,  shelter>  . 
<physieal  response>  <hinders>  <people  mobility>  . 

<people  mobility>  <eauses>  <eontacts>  . 

<evaeuate>  <requires>  <eooperation>  . 

<quarantine>  <does  not  require>  <oooperation>  . 

<shelter>  <requires>  <oooperation>  . 

<isolate>  <does  not  require>  <oooperation>  . 

<oooperation>  <oauses>  <suocess  of  physieal  response>  . 

<physieal  response>  <eauses>  <infeetion  rate>  . 
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Appendix  D.  CONSTRUCT  Ontology  and 
Knowledge  Base 

CONSTRUCT  is  a  complex  model  of  co-evolution  of  cognition  (knowledge)  and 
structure  of  social  networks.  It  has  many  derivatives  including  CONSTRUCT-TM  and 
DyNet.  The  following  are  the  ontology  and  knowledge  bases  of  the  original 
CONSTRUCT,  written  in  the  modified  N3. 

<interaction>  <causes>  <knowledge  exchange>  . 

<knowledge  exchange>  <causes>  <shared  knowledge>  . 

<shared  knowledge,  homophily,  information  seeking,  randomness>  <causes> 
<interaction>  . 

<interaction>  <infiuences>  <friendship,  enmity>  . 

<person>  <has  attributes  olN  <age,  gender,  race,  economic  status,  social  status>  . 
<person>  <is  embedded  in>  <social/associational  network,  instrumental  network>  . 
<homophily>  <influences>  <soeial/associational  network>  . 

<information  seeking>  <infiuences>  <instrumental  network>  . 

<interaction>  <eauses>  <unionization>  . 
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